This works for me:
julia> function mindists_sq(pos, dists_min, Acp)
for i in 1:size(pos, 2)
dists_min[i] = Inf
for j in 1:size(Acp, 2)
t = 0.0
for k=1:size(pos,1)
t += (pos[k, i]-Acp[k, j])^2
end
if t < dists_min[i]
dists_min[i] = t
end
end
end
return dists_min
end
mindists_sq (generic function with 1 method)
julia> function test()
const pos = rand(3, 64)
const Acp = rand(3, 1200)
const dists_min = zeros(64)
const tmp = zeros(typeof(Acp[1]), 1)
@time mindists_sq(pos, dists_min, Acp)
end
test (generic function with 1 method)
julia> test();
elapsed time: 0.001279041 seconds (0 bytes allocated)
Is this how you unrolled the innermost loop too?
On Wed, 2014-10-22 at 17:38, [email protected] wrote:
> As a complement, I just tried with Julia master and in the first case, I got
> an
> exec time around 0.7s. The second case is virtually the same.
>
> Julia Version 0.4.0-dev+1177
> Commit 16c3222* (2014-10-22 12:49 UTC)
> Platform Info:
> System: Linux (x86_64-unknown-linux-gnu)
> CPU: Intel(R) Core(TM)2 Duo CPU P8700 @ 2.53GHz
> WORD_SIZE: 64
> BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Penryn)
> LAPACK: libopenblas
> LIBM: libopenlibm
> LLVM: libLLVM-3.3
>
> And as a reference, the corresponding Python code is that one:
>
> def mindists_sq(B, Acp):
> return ((Acp - B[:, None])**2).sum(2).min(1)
>
> And get executed in 4-8 ms.
>
>
> To Julia's honor, I can go down to 4-8 ms with it, but I need to emulate
> Python
> behavior with broadcasts and large preallocated temporaries:
>
> function mindists_sq2(pos, dists_min, Acp, tmp, ntmp)
> for j in 1:size(pos, 2)
> broadcast!(.-, tmp, Acp, sub(pos, :, j))
> for i in 1:length(tmp[:])
> tmp[i] *= tmp[i]
> end
> ntmp[:] = sum(tmp, 1)
> dists_min[j] = minimum(ntmp)
> end
> return dists_min
> end
>
>
> function test(pos, Acp)
> const dists_min = zeros(typeof(Acp[1]), lenght(pos))
> const tmp = similar(Acp)
> const ntmp = similar(Acp[1,:,:])
> @time mindists_sq2(pos, dists_min, Acp, tmp, ntmp)
> end
> test(A, B)
>
> elapsed time: 0.004768065 seconds (3624960 bytes allocated)
>
> but it's a bit more verbose and convoluted.