One theory: it might be triggering garbage-collection (gc), which can dominate 
profiling results. Sometimes you can get into a "periodic cycle" where gc is 
always happening on a particular line. Sometimes that line might be doing a 
tiny amount of work and allocating a tiny amount of memory, but it ends up 
getting "blamed" big-time due to the cleanup work from a whole lot of other 
allocations.

Usually gc events are supposed to be marked in red, but this depends a bit on 
the quality of your backtraces, which in turn seems to depend on your 
particular CPU, platform, etc.

If you need to boost the performance of that line, consider writing non-
allocating versions:

    A_mul_B!(Gtmp, G, C)
    my_inplace_subtract!(G, G, Gtmp)

Finally, for small arrays, it's often faster to do the multiplication in 
julia. You'll probably have to do your own performance test to find out where 
the tradeoff is.

--Tim

On Wednesday, January 07, 2015 03:35:29 PM Christoph Ortner wrote:
> The following figure is part of the results of profiling a
> performance-critical piece of code:
> 
> <https://lh6.googleusercontent.com/-3dfRngVQZGo/VK3BTDrwPaI/AAAAAAAABuc/2zMS
> WI12e9w/s1600/profile.png> * The pink section corresponds to a single line:
>                    G = - G * C
> where G, C are moderate size N x N matrices; this line is called O(N) times.
> * brown block is `gemm_wrapper!`
>  * the green block is the `-` operation in array.jl
> 
> QUESTIONS:
>  - Is the `- operation` really a higher cost than the matrix
> multiplication? Why?
>  - What is happening in part of the pink block that has no block above it?
>  - closely related: from which matrix-size onwards are matrix-vector
> multiplications best performed in BLAS as opposed to for-loops?
> 
> Many thanks,
>     Christoph

Reply via email to