Mystery solved. In #3 I was missing the indexing `[i]` so I was adding a constant to the whole array instead of just incrementing that item each time.
My final results were (note I'm doing way more trials now to offset the GC time): ``` Doing 100000 Iterations of each... ------- Timing with allocation each call elapsed time: 0.474479476 seconds (417591824 bytes allocated, 42.90% gc time) Timing without allocation each call elapsed time: 0.35568662 seconds (1591824 bytes allocated, 2.88% gc time) Timing without allocation using a temp buffer each call elapsed time: 0.18984845 seconds (1591824 bytes allocated, 5.64% gc time) Timing passing array as a parameter elapsed time: 0.173606393 seconds (1591824 bytes allocated, 6.03% gc time) ``` So the really good news is that assigning the object field to a temp variable before my tight loop gives about a 2x speed increase, without modifying my method's API. Thanks all for the pointers! peace, s On Wed, Jun 25, 2014 at 2:57 PM, Spencer Russell <[email protected]> wrote: > Hi Thomas, > > Thanks for the pointer towards @time and the GC info. > > I also just realized I broke a golden performance rule in my test, which > was referencing variables in global scope. > > Putting the whole test inside a function gives more reasonable results in > the sense that #2 and #4 do the exact same amount of allocation, and #2 is > a bit faster than #1, but not as fast as #4. > > ``` > Timing with allocation each call > elapsed time: 0.043889071 seconds (41751824 bytes allocated) > Timing without allocation each call > elapsed time: 0.026565517 seconds (151824 bytes allocated) > Timing without allocation using a temp buffer each call > elapsed time: 29.461950105 seconds (42762391824 bytes allocated, 59.40% gc > time) > Timing passing array as a parameter > elapsed time: 0.01580412 seconds (151824 bytes allocated) > ``` > > I'm still a bit surprised that #2 is that much slower than #4, as it seems > like it's just another pointer dereference, and that #3 isn't a fix for > that. > > peace, > s > > > On Wed, Jun 25, 2014 at 2:39 PM, Tomas Lycken <[email protected]> > wrote: > >> If you measure time using the @time macro instead of with tic()/toc(), >> you also get information about memory allocation and garbage collection. >> Doing that, I find >> >> Timing with allocation each call >> elapsed time: 0.004325641 seconds (4167824 bytes allocated) >> Timing without allocation each call >> elapsed time: 0.53675596 seconds (98399824 bytes allocated, 7.60% gc time) >> Timing without allocation using a temp buffer each call >> elapsed time: 2.165323004 seconds (4309087824 bytes allocated, 54.22% gc >> time) >> Timing passing array as a parameter >> elapsed time: 0.001356721 seconds (7824 bytes allocated) >> >> so you see that the third method is terribly memory-inefficient, both >> allocating and garbage collecting way more than any other method. The last >> method is much faster since it barely allocates any new memory. >> >> // T >> >> On Wednesday, June 25, 2014 7:57:18 PM UTC+2, Spencer Russell wrote: >>> >>> I'm having some trouble understanding some performance issues. I put >>> together a minimal example here: >>> >>> https://gist.github.com/ssfrr/8934c14d8d2703a3d203 >>> >>> I had some methods that were allocating arrays on each call, which I >>> figured wasn't very efficient. >>> >>> My first attempt to improve things was to allocate an array in my main >>> container type on initialization, and then share that between function >>> calls. >>> >>> Suprisingly (to me) this slowed things down by about 60x. >>> >>> I wondered if maybe this was because of the extra dereference to get the >>> array (though the slowdown seemed too dramatic for that) so I saved the >>> reference to the array in a temp variable before my tight loop. >>> >>> This slowed down by an additional 7x (more surprises!). >>> >>> Passing the array as a parameter directly to each function invocation >>> was by far the fastest, and was about 2x faster than my original that >>> allocated each time. This approach complicates my interface somewhat >>> though, as now the caller needs to know how many work buffers the function >>> might need, instead of baking that information into the type. I could >>> probably solve this with a wrapper function, but I'd like to understand >>> what's going on and if there's some sort of type-inference thing I should >>> clean up. >>> >>> Specifically my questions are: >>> >>> 1. Why is accessing the array as a parameter so much faster than >>> accessing the array through an object passed as a parameter? As far as I >>> can tell the same type information is there. >>> 2. Why does it slow things down so much to store the reference to >>> the array in the beginning of the function and then access that in the >>> tight loop? >>> >>> >>> peace, >>> s >>> >> >
