Mystery solved. In #3 I was missing the indexing `[i]` so I was adding a
constant to the whole array instead of just incrementing that item each
time.

My final results were (note I'm doing way more trials now to offset the GC
time):

```
Doing 100000 Iterations of each...
-------
Timing with allocation each call
elapsed time: 0.474479476 seconds (417591824 bytes allocated, 42.90% gc
time)
Timing without allocation each call
elapsed time: 0.35568662 seconds (1591824 bytes allocated, 2.88% gc time)
Timing without allocation using a temp buffer each call
elapsed time: 0.18984845 seconds (1591824 bytes allocated, 5.64% gc time)
Timing passing array as a parameter
elapsed time: 0.173606393 seconds (1591824 bytes allocated, 6.03% gc time)
```

So the really good news is that assigning the object field to a temp
variable before my tight loop gives about a 2x speed increase, without
modifying my method's API.

Thanks all for the pointers!


peace,
s


On Wed, Jun 25, 2014 at 2:57 PM, Spencer Russell <[email protected]> wrote:

> Hi Thomas,
>
> Thanks for the pointer towards @time and the GC info.
>
> I also just realized I broke a golden performance rule in my test, which
> was referencing variables in global scope.
>
> Putting the whole test inside a function gives more reasonable results in
> the sense that #2 and #4 do the exact same amount of allocation, and #2 is
> a bit faster than #1, but not as fast as #4.
>
> ```
> Timing with allocation each call
> elapsed time: 0.043889071 seconds (41751824 bytes allocated)
> Timing without allocation each call
> elapsed time: 0.026565517 seconds (151824 bytes allocated)
> Timing without allocation using a temp buffer each call
> elapsed time: 29.461950105 seconds (42762391824 bytes allocated, 59.40% gc
> time)
> Timing passing array as a parameter
> elapsed time: 0.01580412 seconds (151824 bytes allocated)
> ```
>
> I'm still a bit surprised that #2 is that much slower than #4, as it seems
> like it's just another pointer dereference, and that #3 isn't a fix for
> that.
>
> peace,
> s
>
>
> On Wed, Jun 25, 2014 at 2:39 PM, Tomas Lycken <[email protected]>
> wrote:
>
>> If you measure time using the @time macro instead of with tic()/toc(),
>> you also get information about memory allocation and garbage collection.
>> Doing that, I find
>>
>> Timing with allocation each call
>> elapsed time: 0.004325641 seconds (4167824 bytes allocated)
>> Timing without allocation each call
>> elapsed time: 0.53675596 seconds (98399824 bytes allocated, 7.60% gc time)
>> Timing without allocation using a temp buffer each call
>> elapsed time: 2.165323004 seconds (4309087824 bytes allocated, 54.22% gc
>> time)
>> Timing passing array as a parameter
>> elapsed time: 0.001356721 seconds (7824 bytes allocated)
>>
>> so you see that the third method is terribly memory-inefficient, both
>> allocating and garbage collecting way more than any other method. The last
>> method is much faster since it barely allocates any new memory.
>>
>> // T
>>
>> On Wednesday, June 25, 2014 7:57:18 PM UTC+2, Spencer Russell wrote:
>>>
>>> I'm having some trouble understanding some performance issues. I put
>>> together a minimal example here:
>>>
>>> https://gist.github.com/ssfrr/8934c14d8d2703a3d203
>>>
>>> I had some methods that were allocating arrays on each call, which I
>>> figured wasn't very efficient.
>>>
>>> My first attempt to improve things was to allocate an array in my main
>>> container type on initialization, and then share that between function
>>> calls.
>>>
>>> Suprisingly (to me) this slowed things down by about 60x.
>>>
>>> I wondered if maybe this was because of the extra dereference to get the
>>> array (though the slowdown seemed too dramatic for that) so I saved the
>>> reference to the array in a temp variable before my tight loop.
>>>
>>> This slowed down by an additional 7x (more surprises!).
>>>
>>> Passing the array as a parameter directly to each function invocation
>>> was by far the fastest, and was about 2x faster than my original that
>>> allocated each time. This approach complicates my interface somewhat
>>> though, as now the caller needs to know how many work buffers the function
>>> might need, instead of baking that information into the type. I could
>>> probably solve this with a wrapper function, but I'd like to understand
>>> what's going on and if there's some sort of type-inference thing I should
>>> clean up.
>>>
>>> Specifically my questions are:
>>>
>>>    1. Why is accessing the array as a parameter so much faster than
>>>    accessing the array through an object passed as a parameter? As far as I
>>>    can tell the same type information is there.
>>>    2. Why does it slow things down so much to store the reference to
>>>    the array in the beginning of the function and then access that in the
>>>    tight loop?
>>>
>>>
>>> peace,
>>> s
>>>
>>
>

Reply via email to