Great write up! After some experiments I was able to reduce GC time from 65% to only 15% and see opportunities to do even better. Most important things for me were:
1. Some BLAS functions (especially "gemm!", which is pretty flexible). 2. Manual devectorization (@devec didn't work for my case). I see one disadvantage of using these tools, however - they are much harder to read. Are there any plans for automatic code optimization on compiler level? On Sun, Jul 20, 2014 at 8:12 PM, Keith Campbell <[email protected]> wrote: > Dahua Lin's post at http://julialang.org/blog/2013/09/fast-numeric/ > might be helpful. > > > On Sunday, July 20, 2014 11:41:19 AM UTC-4, Andrei Zh wrote: >> >> Recently I found that my application spends ~65% of time in garbage >> collector. I'm looking for ways to reduce amount of memory produced by >> intermediate results. >> For example, I found that "A * B" may be changed to "A_mul_B!(out, A, B)" >> that uses preallocated "out" buffer and thus almost eliminates additional >> memory allocation. But my application still produces lots of garbage on >> operations like matrix addition/subtraction, multiplication by scalar, etc. >> >> Are there any other tricks that allow to decrease memory usage? >> >
