Ah, thanks for that!

I did put the gc() inside the @time intentionally, so that each test would
include the time to clean up after itself.

I still find it a bit tricky to reason about what can/can't be optimized,
but I'm getting better about writing performance tests so that if I make
changes later that cause a bunch of new allocation it should get flagged
when I run my unit tests.

peace,
s


On Wed, Jun 25, 2014 at 3:53 PM, Simon Kornblith <si...@simonster.com>
wrote:

> There shouldn't be any allocation in the last 3 calls. There is only
> allocation because, at the top of buftest(), you define:
>
> _ = with_alloc(cont)
> _ = without_alloc(cont)
> _ = without_alloc_with_temp(cont)
> _ = using_parameter(cont, buf)
>
> This doesn't actually introduce uncertainty about the types, but it does
> deoptimize things a bit: it causes Julia to box _ when you use it as an
> iteration variable later in buftest(), which is the source of the
> allocation. Simply deleting the LHS here removes the allocation in your
> tests of those three functions, although it doesn't make much of a
> difference in performance.
>
> Also note that you put gc() within the @time block, which may not be what
> you want. With the change above, there isn't really a point to calling gc()
> at all, since only with_alloc allocates.
>
> Simon
>
>
> On Wednesday, June 25, 2014 3:24:38 PM UTC-4, Spencer Russell wrote:
>
>> Mystery solved. In #3 I was missing the indexing `[i]` so I was adding a
>> constant to the whole array instead of just incrementing that item each
>> time.
>>
>> My final results were (note I'm doing way more trials now to offset the
>> GC time):
>>
>> ```
>> Doing 100000 Iterations of each...
>> -------
>> Timing with allocation each call
>> elapsed time: 0.474479476 seconds (417591824 bytes allocated, 42.90% gc
>> time)
>> Timing without allocation each call
>> elapsed time: 0.35568662 seconds (1591824 bytes allocated, 2.88% gc time)
>> Timing without allocation using a temp buffer each call
>> elapsed time: 0.18984845 seconds (1591824 bytes allocated, 5.64% gc time)
>> Timing passing array as a parameter
>> elapsed time: 0.173606393 seconds (1591824 bytes allocated, 6.03% gc time)
>> ```
>>
>> So the really good news is that assigning the object field to a temp
>> variable before my tight loop gives about a 2x speed increase, without
>> modifying my method's API.
>>
>> Thanks all for the pointers!
>>
>>
>> peace,
>> s
>>
>>
>> On Wed, Jun 25, 2014 at 2:57 PM, Spencer Russell <s...@mit.edu> wrote:
>>
>>> Hi Thomas,
>>>
>>> Thanks for the pointer towards @time and the GC info.
>>>
>>> I also just realized I broke a golden performance rule in my test, which
>>> was referencing variables in global scope.
>>>
>>> Putting the whole test inside a function gives more reasonable results
>>> in the sense that #2 and #4 do the exact same amount of allocation, and #2
>>> is a bit faster than #1, but not as fast as #4.
>>>
>>> ```
>>> Timing with allocation each call
>>> elapsed time: 0.043889071 seconds (41751824 bytes allocated)
>>> Timing without allocation each call
>>> elapsed time: 0.026565517 seconds (151824 bytes allocated)
>>> Timing without allocation using a temp buffer each call
>>> elapsed time: 29.461950105 seconds (42762391824 bytes allocated, 59.40%
>>> gc time)
>>> Timing passing array as a parameter
>>> elapsed time: 0.01580412 seconds (151824 bytes allocated)
>>> ```
>>>
>>> I'm still a bit surprised that #2 is that much slower than #4, as it
>>> seems like it's just another pointer dereference, and that #3 isn't a fix
>>> for that.
>>>
>>> peace,
>>> s
>>>
>>>
>>> On Wed, Jun 25, 2014 at 2:39 PM, Tomas Lycken <tomas....@gmail.com>
>>> wrote:
>>>
>>>> If you measure time using the @time macro instead of with tic()/toc(),
>>>> you also get information about memory allocation and garbage collection.
>>>> Doing that, I find
>>>>
>>>> Timing with allocation each call
>>>> elapsed time: 0.004325641 seconds (4167824 bytes allocated)
>>>> Timing without allocation each call
>>>> elapsed time: 0.53675596 seconds (98399824 bytes allocated, 7.60% gc
>>>> time)
>>>> Timing without allocation using a temp buffer each call
>>>> elapsed time: 2.165323004 seconds (4309087824 bytes allocated, 54.22%
>>>> gc time)
>>>> Timing passing array as a parameter
>>>> elapsed time: 0.001356721 seconds (7824 bytes allocated)
>>>>
>>>> so you see that the third method is terribly memory-inefficient, both
>>>> allocating and garbage collecting way more than any other method. The last
>>>> method is much faster since it barely allocates any new memory.
>>>>
>>>> // T
>>>>
>>>> On Wednesday, June 25, 2014 7:57:18 PM UTC+2, Spencer Russell wrote:
>>>>>
>>>>> I'm having some trouble understanding some performance issues. I put
>>>>> together a minimal example here:
>>>>>
>>>>> https://gist.github.com/ssfrr/8934c14d8d2703a3d203
>>>>>
>>>>> I had some methods that were allocating arrays on each call, which I
>>>>> figured wasn't very efficient.
>>>>>
>>>>> My first attempt to improve things was to allocate an array in my main
>>>>> container type on initialization, and then share that between function
>>>>> calls.
>>>>>
>>>>> Suprisingly (to me) this slowed things down by about 60x.
>>>>>
>>>>> I wondered if maybe this was because of the extra dereference to get
>>>>> the array (though the slowdown seemed too dramatic for that) so I saved 
>>>>> the
>>>>> reference to the array in a temp variable before my tight loop.
>>>>>
>>>>> This slowed down by an additional 7x (more surprises!).
>>>>>
>>>>> Passing the array as a parameter directly to each function invocation
>>>>> was by far the fastest, and was about 2x faster than my original that
>>>>> allocated each time. This approach complicates my interface somewhat
>>>>> though, as now the caller needs to know how many work buffers the function
>>>>> might need, instead of baking that information into the type. I could
>>>>> probably solve this with a wrapper function, but I'd like to understand
>>>>> what's going on and if there's some sort of type-inference thing I should
>>>>> clean up.
>>>>>
>>>>> Specifically my questions are:
>>>>>
>>>>>    1. Why is accessing the array as a parameter so much faster than
>>>>>    accessing the array through an object passed as a parameter? As far as 
>>>>> I
>>>>>    can tell the same type information is there.
>>>>>    2. Why does it slow things down so much to store the reference to
>>>>>    the array in the beginning of the function and then access that in the
>>>>>    tight loop?
>>>>>
>>>>>
>>>>> peace,
>>>>> s
>>>>>
>>>>
>>>
>>

Reply via email to