Re: [Factor-talk] Naive loop optimization

Marmaduke Woodman Fri, 06 Feb 2015 08:42:08 -0800

Ah really nice. Thanks for the abundant information.

The context of such a word is a neural network simulator where between
hundreds and thousands of nodes need to be updated in place. Usually, the
state of the network is stored as a big array of floats. The size of the
network is known ahead of time; two questions


1) are specialized arrays the way to go? ie. do the nth & set-nth words
generate memory access as efficient as with pointer dereferencing in C?

2) in a loop over such an array, is it correct to assume that words like
lin-osc3 to be inlined? (when they are marked inline)

On Fri, Feb 6, 2015 at 4:53 PM, John Benediktsson <mrj...@gmail.com> wrote:

> Some thoughts for you:
>
> No, ``dup`` does not do anything but duplicate essentially a pointer to
> the object.
>
> Part of the reason it is slow is that you are operating on a kind of box
> by keeping your { x y } pairs in arrays (and in some cases unboxing
> ``first2`` and re-boxing ``2array``).  Each of the "math" words (``v*n``,
> ``v+``, ``v/n``, etc.) also do the same.  They aren't in-place operations,
> so always allocating memory.
>
> In addition to ``time``, you can also ``profile`` your program:
>
>     IN: scratchpad gc [ bench1 ] profile
>     Running time: 2.656065607 seconds
>
>     IN: scratchpad flat profile.
>     depth   time ms  GC %  JIT %  FFI %   FT %
>        0    2657.0   5.91   0.00  17.69   0.00 T{ thread f "Listener"
> ~curry~ ~quotation~ 39 ~box~ f t f H{ } f...
>        0    2656.0   5.87   0.00  17.66   0.00   bench1
>        0    2655.0   5.88   0.00  17.66   0.00   step
>        0     433.0  18.01   0.00  18.24   0.00   *
>        0     430.0  18.14   0.00  90.70   0.00   <array>
>        0     363.0   0.00   0.00   0.00   0.00   M\ array nth-unsafe
>        0     275.0   0.00   0.00   0.00   0.00   /
>        0     232.0   0.00   0.00   0.00   0.00   <
>        0     194.0   0.00   0.00   0.00   0.00   +
>        0     178.0   0.00   0.00   0.00   0.00   M\ array length
>        0     141.0   0.00   0.00   0.00   0.00   M\ array set-nth-unsafe
>        0     140.0   0.00   0.00   0.00   0.00   M\ sequence nth
>        0     113.0   0.00   0.00   0.00   0.00   M\ integer bounds-check?
>        0     104.0   0.00   0.00   0.00   0.00   M\ fixnum integer>fixnum
>
> Here's a couple ideas for speeding it up.
>
> You can "inline" all the math, so that you operate on ``x`` and ``y``, not
> ``{ x y }``, avoiding all the array accesses and mallocs.
>
>     : lin-osc2 ( x y -- x1 y1 )
>         2dup                     ! x y x y
>         swap                     ! x y y x
>         -1.0 *                   ! x y y -x
>         [ 0.01 * ] bi@           ! x y dx*dt dy*dt
>         [ + ] bi-curry@ bi*      ! x1 y1
>         2dup [ absq ] bi@ + sqrt ! x1 y1 norm
>         [ / ] curry bi@ ; inline
>
>     : bench2 ( -- x y )
>         1.0 0.0 10,000,000 [ lin-osc2 ] times ;
>
> Note: you have to return something in "bench" or too much gets "optimized".
>
>     IN: scratchpad gc [ bench2 ] time
>     Running time: 0.213926035 seconds
>
> Sometimes it might be easier for you to see the flow of math with locals
> (doesn't affect performance):
>
>     :: lin-osc3 ( x y -- x' y' )
>         0.01 :> dt
>         y :> dx
>         x neg :> dy
>
>         x dx dt * + :> x1
>         y dy dt * + :> y1
>
>         x1 y1 [ absq ] bi@ + sqrt :> norm
>
>         x1 norm /
>         y1 norm / ; inline
>
>     : bench3 ( -- {x,y} )
>         1.0 0.0 10,000,000 [ lin-osc3 ] times 2array ;
>
> You can look into using things like the "typed" vocabulary, although
> because of the way we are inlining the inputs above, it should already
> "know" that it is operating on floats.
>
> The "typed" vocabulary checks inputs against known types.  If you don't
> want to slow down to do that, you can just declare your types (its unsafe
> and in a private vocabulary because if you declare the wrong type you can
> hard crash and we don't it to be mis-used):
>
>     { float float } declare
>
> Anyway, hope that helps, and sorry for the spam on the paste site,
> reCAPTCHA isn't what it used to be for keeping away the bad robots.
>
> Best,
> John.
>
>
>
>
>
>
>
>
>
> On Fri, Feb 6, 2015 at 4:07 AM, Marmaduke Woodman <mmwood...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I've attempted to write a perhaps naive ODE integration loop in Factor,
>>
>>   http://paste.factorcode.org/paste?id=3428
>>
>> but it seems quite slow: the `bench1` word reports running time of ~3 s,
>> which is an order of magnitude off equivalent OCaml & Haskell, so I imagine
>> due to my lack of Factor experience there's boxing, unboxing and mixing of
>> types leading to poor performance.
>>
>> Are there some general principles for writing performant numerical code
>> in Factor? Do the generic sequence words get optimized or explicit use of
>> unsafe words are required?
>>
>> A specific question about `dup` on container types: are the underlying
>> data duplicated?
>>
>> If I've missed any potential reading matériels on this, refs would be
>> much appreciated.
>>
>> Cheers,
>> Marmaduke
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Factor-talk mailing list
>> Factor-talk@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/factor-talk
>>
>>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Factor-talk mailing list
> Factor-talk@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/factor-talk
>
>

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/

_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk

Re: [Factor-talk] Naive loop optimization

Reply via email to