accoujulia, see the "performance tips" section of the manual, which describes several tools.
--Tim On Wednesday, October 22, 2014 07:56:11 AM [email protected] wrote: > Please consider this implementation of a naive nearest neighbors distance > search: > > function mindists_sq(pos, dists_min, Acp, t) > for i in 1:size(pos, 2) > dists_min[i] = Inf > for j in 1:size(Acp, 2) > sum!(t, (sub(pos, :, i) - sub(Acp, :, j)).^2) > if t[1] < dists_min[i] > dists_min[i] = t[1] > end > end > end > return dists_min > end > > > And this: > > function test() > const pos = rand(3, 64) > const Acp = rand(3, 1200) > const dists_min = zeros(64) > const tmp = zeros(typeof(Acp[1]), 1) > @time mindists_sq(pos, dists_min, Acp, tmp) > end > test() > > > I get: > > elapsed time: 1.075699957 seconds (130182144 bytes allocated, 10.60% gc time > ) > > > There are two very surprising things here: > > - First I am not sure where the allocation comes from. > (sub(pos, :, i) - sub(Acp, :, j)).^2 > most certainly generate a new array in the process. t[1] and > dists_min[i] might too (but these are immutable bit types). I tried to > unroll the loop, but I got even more allocation and it's twice slower. > - Second, the execution time is insane. I tried to put @inbounds > wherever possible but without success. More surprising, the following is > 10 times faster and allocates 90MB less: > function mindists_sq(pos, dists_min, Acp) > for i in 1:size(pos, 2) > dists_min[i] = Inf > for j in 1:size(Acp, 2) > t = sum((pos[:, i]-Acp[:, j]).^2) > if t .< dists_min[i] > dists_min[i] = t > end > end > end > return dists_min > end > > elapsed time: 0.11141394 seconds (37555212 bytes allocated, 48.96% gc time) > > > Is there anything that can help me map new allocations with line numbers? > Is there any way to get the allocation number to zero? Can you reproduce > the described behavior? > > Julia Version 0.3.1 > Commit c03f413 (2014-09-21 21:30 UTC) > Platform Info: > System: Linux (x86_64-unknown-linux-gnu) > CPU: Intel(R) Core(TM)2 Duo CPU P8700 @ 2.53GHz > WORD_SIZE: 64 > BLAS: libopenblas (NO_AFFINITY PENRYN) > LAPACK: liblapack > LIBM: libm > LLVM: libLLVM-3.3 > > Best regards
