There shouldn't be any allocation in the last 3 calls. There is only 
allocation because, at the top of buftest(), you define:

_ = with_alloc(cont)
_ = without_alloc(cont)
_ = without_alloc_with_temp(cont)
_ = using_parameter(cont, buf)

This doesn't actually introduce uncertainty about the types, but it does 
deoptimize things a bit: it causes Julia to box _ when you use it as an 
iteration variable later in buftest(), which is the source of the 
allocation. Simply deleting the LHS here removes the allocation in your 
tests of those three functions, although it doesn't make much of a 
difference in performance.

Also note that you put gc() within the @time block, which may not be what 
you want. With the change above, there isn't really a point to calling gc() 
at all, since only with_alloc allocates.

Simon

On Wednesday, June 25, 2014 3:24:38 PM UTC-4, Spencer Russell wrote:
>
> Mystery solved. In #3 I was missing the indexing `[i]` so I was adding a 
> constant to the whole array instead of just incrementing that item each 
> time.
>
> My final results were (note I'm doing way more trials now to offset the GC 
> time):
>
> ```
> Doing 100000 Iterations of each...
> -------
> Timing with allocation each call
> elapsed time: 0.474479476 seconds (417591824 bytes allocated, 42.90% gc 
> time)
> Timing without allocation each call
> elapsed time: 0.35568662 seconds (1591824 bytes allocated, 2.88% gc time)
> Timing without allocation using a temp buffer each call
> elapsed time: 0.18984845 seconds (1591824 bytes allocated, 5.64% gc time)
> Timing passing array as a parameter
> elapsed time: 0.173606393 seconds (1591824 bytes allocated, 6.03% gc time)
> ```
>
> So the really good news is that assigning the object field to a temp 
> variable before my tight loop gives about a 2x speed increase, without 
> modifying my method's API.
>
> Thanks all for the pointers!
>
>
> peace,
> s
>
>
> On Wed, Jun 25, 2014 at 2:57 PM, Spencer Russell <[email protected] 
> <javascript:>> wrote:
>
>> Hi Thomas,
>>
>> Thanks for the pointer towards @time and the GC info.
>>
>> I also just realized I broke a golden performance rule in my test, which 
>> was referencing variables in global scope.
>>
>> Putting the whole test inside a function gives more reasonable results in 
>> the sense that #2 and #4 do the exact same amount of allocation, and #2 is 
>> a bit faster than #1, but not as fast as #4.
>>
>> ```
>> Timing with allocation each call
>> elapsed time: 0.043889071 seconds (41751824 bytes allocated)
>> Timing without allocation each call
>> elapsed time: 0.026565517 seconds (151824 bytes allocated)
>> Timing without allocation using a temp buffer each call
>> elapsed time: 29.461950105 seconds (42762391824 bytes allocated, 59.40% 
>> gc time)
>> Timing passing array as a parameter
>> elapsed time: 0.01580412 seconds (151824 bytes allocated)
>> ```
>>
>> I'm still a bit surprised that #2 is that much slower than #4, as it 
>> seems like it's just another pointer dereference, and that #3 isn't a fix 
>> for that.
>>  
>> peace,
>> s
>>
>>
>> On Wed, Jun 25, 2014 at 2:39 PM, Tomas Lycken <[email protected] 
>> <javascript:>> wrote:
>>
>>> If you measure time using the @time macro instead of with tic()/toc(), 
>>> you also get information about memory allocation and garbage collection. 
>>> Doing that, I find
>>>
>>> Timing with allocation each call
>>> elapsed time: 0.004325641 seconds (4167824 bytes allocated)
>>> Timing without allocation each call
>>> elapsed time: 0.53675596 seconds (98399824 bytes allocated, 7.60% gc 
>>> time)
>>> Timing without allocation using a temp buffer each call
>>> elapsed time: 2.165323004 seconds (4309087824 bytes allocated, 54.22% 
>>> gc time)
>>> Timing passing array as a parameter
>>> elapsed time: 0.001356721 seconds (7824 bytes allocated)
>>>
>>> so you see that the third method is terribly memory-inefficient, both 
>>> allocating and garbage collecting way more than any other method. The last 
>>> method is much faster since it barely allocates any new memory.
>>>
>>> // T
>>>
>>> On Wednesday, June 25, 2014 7:57:18 PM UTC+2, Spencer Russell wrote:
>>>>
>>>> I'm having some trouble understanding some performance issues. I put 
>>>> together a minimal example here:
>>>>
>>>> https://gist.github.com/ssfrr/8934c14d8d2703a3d203
>>>>
>>>> I had some methods that were allocating arrays on each call, which I 
>>>> figured wasn't very efficient.
>>>>
>>>> My first attempt to improve things was to allocate an array in my main 
>>>> container type on initialization, and then share that between function 
>>>> calls.
>>>>
>>>> Suprisingly (to me) this slowed things down by about 60x.
>>>>
>>>> I wondered if maybe this was because of the extra dereference to get 
>>>> the array (though the slowdown seemed too dramatic for that) so I saved 
>>>> the 
>>>> reference to the array in a temp variable before my tight loop.
>>>>
>>>> This slowed down by an additional 7x (more surprises!).
>>>>
>>>> Passing the array as a parameter directly to each function invocation 
>>>> was by far the fastest, and was about 2x faster than my original that 
>>>> allocated each time. This approach complicates my interface somewhat 
>>>> though, as now the caller needs to know how many work buffers the function 
>>>> might need, instead of baking that information into the type. I could 
>>>> probably solve this with a wrapper function, but I'd like to understand 
>>>> what's going on and if there's some sort of type-inference thing I should 
>>>> clean up.
>>>>
>>>> Specifically my questions are:
>>>>
>>>>    1. Why is accessing the array as a parameter so much faster than 
>>>>    accessing the array through an object passed as a parameter? As far as 
>>>> I 
>>>>    can tell the same type information is there. 
>>>>    2. Why does it slow things down so much to store the reference to 
>>>>    the array in the beginning of the function and then access that in the 
>>>>    tight loop?
>>>>
>>>>
>>>> peace,
>>>> s
>>>>  
>>>
>>
>

Reply via email to