Hi Thomas,

Thanks for the pointer towards @time and the GC info.

I also just realized I broke a golden performance rule in my test, which
was referencing variables in global scope.

Putting the whole test inside a function gives more reasonable results in
the sense that #2 and #4 do the exact same amount of allocation, and #2 is
a bit faster than #1, but not as fast as #4.

```
Timing with allocation each call
elapsed time: 0.043889071 seconds (41751824 bytes allocated)
Timing without allocation each call
elapsed time: 0.026565517 seconds (151824 bytes allocated)
Timing without allocation using a temp buffer each call
elapsed time: 29.461950105 seconds (42762391824 bytes allocated, 59.40% gc
time)
Timing passing array as a parameter
elapsed time: 0.01580412 seconds (151824 bytes allocated)
```

I'm still a bit surprised that #2 is that much slower than #4, as it seems
like it's just another pointer dereference, and that #3 isn't a fix for
that.

peace,
s


On Wed, Jun 25, 2014 at 2:39 PM, Tomas Lycken <[email protected]>
wrote:

> If you measure time using the @time macro instead of with tic()/toc(), you
> also get information about memory allocation and garbage collection. Doing
> that, I find
>
> Timing with allocation each call
> elapsed time: 0.004325641 seconds (4167824 bytes allocated)
> Timing without allocation each call
> elapsed time: 0.53675596 seconds (98399824 bytes allocated, 7.60% gc time)
> Timing without allocation using a temp buffer each call
> elapsed time: 2.165323004 seconds (4309087824 bytes allocated, 54.22% gc
> time)
> Timing passing array as a parameter
> elapsed time: 0.001356721 seconds (7824 bytes allocated)
>
> so you see that the third method is terribly memory-inefficient, both
> allocating and garbage collecting way more than any other method. The last
> method is much faster since it barely allocates any new memory.
>
> // T
>
> On Wednesday, June 25, 2014 7:57:18 PM UTC+2, Spencer Russell wrote:
>>
>> I'm having some trouble understanding some performance issues. I put
>> together a minimal example here:
>>
>> https://gist.github.com/ssfrr/8934c14d8d2703a3d203
>>
>> I had some methods that were allocating arrays on each call, which I
>> figured wasn't very efficient.
>>
>> My first attempt to improve things was to allocate an array in my main
>> container type on initialization, and then share that between function
>> calls.
>>
>> Suprisingly (to me) this slowed things down by about 60x.
>>
>> I wondered if maybe this was because of the extra dereference to get the
>> array (though the slowdown seemed too dramatic for that) so I saved the
>> reference to the array in a temp variable before my tight loop.
>>
>> This slowed down by an additional 7x (more surprises!).
>>
>> Passing the array as a parameter directly to each function invocation was
>> by far the fastest, and was about 2x faster than my original that allocated
>> each time. This approach complicates my interface somewhat though, as now
>> the caller needs to know how many work buffers the function might need,
>> instead of baking that information into the type. I could probably solve
>> this with a wrapper function, but I'd like to understand what's going on
>> and if there's some sort of type-inference thing I should clean up.
>>
>> Specifically my questions are:
>>
>>    1. Why is accessing the array as a parameter so much faster than
>>    accessing the array through an object passed as a parameter? As far as I
>>    can tell the same type information is there.
>>    2. Why does it slow things down so much to store the reference to the
>>    array in the beginning of the function and then access that in the tight
>>    loop?
>>
>>
>> peace,
>> s
>>
>

Reply via email to