As a complement, I just tried with Julia master and in the first case, I
got an exec time around 0.7s. The second case is virtually the same.
Julia Version 0.4.0-dev+1177
Commit 16c3222* (2014-10-22 12:49 UTC)
Platform Info:
System: Linux (x86_64-unknown-linux-gnu)
CPU: Intel(R) Core(TM)2 Duo CPU P8700 @ 2.53GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Penryn)
LAPACK: libopenblas
LIBM: libopenlibm
LLVM: libLLVM-3.3
And as a reference, the corresponding Python code is that one:
def mindists_sq(B, Acp):
return ((Acp - B[:, None])**2).sum(2).min(1)
And get executed in 4-8 ms.
To Julia's honor, I can go down to 4-8 ms with it, but I need to emulate
Python behavior with broadcasts and large preallocated temporaries:
function mindists_sq2(pos, dists_min, Acp, tmp, ntmp)
for j in 1:size(pos, 2)
broadcast!(.-, tmp, Acp, sub(pos, :, j))
for i in 1:length(tmp[:])
tmp[i] *= tmp[i]
end
ntmp[:] = sum(tmp, 1)
dists_min[j] = minimum(ntmp)
end
return dists_min
end
function test(pos, Acp)
const dists_min = zeros(typeof(Acp[1]), lenght(pos))
const tmp = similar(Acp)
const ntmp = similar(Acp[1,:,:])
@time mindists_sq2(pos, dists_min, Acp, tmp, ntmp)
end
test(A, B)
elapsed time: 0.004768065 seconds (3624960 bytes allocated)
but it's a bit more verbose and convoluted.