Hi Thomas, Thanks for the pointer towards @time and the GC info.
I also just realized I broke a golden performance rule in my test, which was referencing variables in global scope. Putting the whole test inside a function gives more reasonable results in the sense that #2 and #4 do the exact same amount of allocation, and #2 is a bit faster than #1, but not as fast as #4. ``` Timing with allocation each call elapsed time: 0.043889071 seconds (41751824 bytes allocated) Timing without allocation each call elapsed time: 0.026565517 seconds (151824 bytes allocated) Timing without allocation using a temp buffer each call elapsed time: 29.461950105 seconds (42762391824 bytes allocated, 59.40% gc time) Timing passing array as a parameter elapsed time: 0.01580412 seconds (151824 bytes allocated) ``` I'm still a bit surprised that #2 is that much slower than #4, as it seems like it's just another pointer dereference, and that #3 isn't a fix for that. peace, s On Wed, Jun 25, 2014 at 2:39 PM, Tomas Lycken <[email protected]> wrote: > If you measure time using the @time macro instead of with tic()/toc(), you > also get information about memory allocation and garbage collection. Doing > that, I find > > Timing with allocation each call > elapsed time: 0.004325641 seconds (4167824 bytes allocated) > Timing without allocation each call > elapsed time: 0.53675596 seconds (98399824 bytes allocated, 7.60% gc time) > Timing without allocation using a temp buffer each call > elapsed time: 2.165323004 seconds (4309087824 bytes allocated, 54.22% gc > time) > Timing passing array as a parameter > elapsed time: 0.001356721 seconds (7824 bytes allocated) > > so you see that the third method is terribly memory-inefficient, both > allocating and garbage collecting way more than any other method. The last > method is much faster since it barely allocates any new memory. > > // T > > On Wednesday, June 25, 2014 7:57:18 PM UTC+2, Spencer Russell wrote: >> >> I'm having some trouble understanding some performance issues. I put >> together a minimal example here: >> >> https://gist.github.com/ssfrr/8934c14d8d2703a3d203 >> >> I had some methods that were allocating arrays on each call, which I >> figured wasn't very efficient. >> >> My first attempt to improve things was to allocate an array in my main >> container type on initialization, and then share that between function >> calls. >> >> Suprisingly (to me) this slowed things down by about 60x. >> >> I wondered if maybe this was because of the extra dereference to get the >> array (though the slowdown seemed too dramatic for that) so I saved the >> reference to the array in a temp variable before my tight loop. >> >> This slowed down by an additional 7x (more surprises!). >> >> Passing the array as a parameter directly to each function invocation was >> by far the fastest, and was about 2x faster than my original that allocated >> each time. This approach complicates my interface somewhat though, as now >> the caller needs to know how many work buffers the function might need, >> instead of baking that information into the type. I could probably solve >> this with a wrapper function, but I'd like to understand what's going on >> and if there's some sort of type-inference thing I should clean up. >> >> Specifically my questions are: >> >> 1. Why is accessing the array as a parameter so much faster than >> accessing the array through an object passed as a parameter? As far as I >> can tell the same type information is there. >> 2. Why does it slow things down so much to store the reference to the >> array in the beginning of the function and then access that in the tight >> loop? >> >> >> peace, >> s >> >
