Great write up! After some experiments I was able to reduce GC time from
65%  to only 15% and see opportunities to do even better. Most important
things for me were:

 1. Some BLAS functions (especially "gemm!", which is pretty flexible).
 2. Manual devectorization (@devec didn't work for my case).

I see one disadvantage of using these tools, however - they are much harder
to read. Are there any plans for automatic code optimization on compiler
level?



On Sun, Jul 20, 2014 at 8:12 PM, Keith Campbell <[email protected]>
wrote:

> Dahua Lin's post at http://julialang.org/blog/2013/09/fast-numeric/
> might be helpful.
>
>
> On Sunday, July 20, 2014 11:41:19 AM UTC-4, Andrei Zh wrote:
>>
>> Recently I found that my application spends ~65% of time in garbage
>> collector. I'm looking for ways to reduce amount of memory produced by
>> intermediate results.
>> For example, I found that "A * B" may be changed to "A_mul_B!(out, A, B)"
>> that uses preallocated "out" buffer and thus almost eliminates additional
>> memory allocation. But my application still produces lots of garbage on
>> operations like matrix addition/subtraction, multiplication by scalar, etc.
>>
>> Are there any other tricks that allow to decrease memory usage?
>>
>

Reply via email to