As a complement, I just tried with Julia master and in the first case, I 
got an exec time around 0.7s. The second case is virtually the same.

Julia Version 0.4.0-dev+1177
Commit 16c3222* (2014-10-22 12:49 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Core(TM)2 Duo CPU     P8700  @ 2.53GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Penryn)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

And as a reference, the corresponding Python code is that one:

def mindists_sq(B, Acp):
    return ((Acp - B[:, None])**2).sum(2).min(1)

And get executed in 4-8 ms. 


To Julia's honor, I can go down to 4-8 ms with it, but I need to emulate 
Python behavior with broadcasts and large preallocated temporaries:

function mindists_sq2(pos, dists_min, Acp, tmp, ntmp)
    for j in 1:size(pos, 2)
        broadcast!(.-, tmp, Acp, sub(pos, :, j))
            for i in 1:length(tmp[:])
            tmp[i] *= tmp[i]
            end
        ntmp[:] = sum(tmp, 1)
        dists_min[j] = minimum(ntmp)
    end
    return dists_min
end


function test(pos, Acp)
    const dists_min = zeros(typeof(Acp[1]), lenght(pos))
    const tmp = similar(Acp)
    const ntmp = similar(Acp[1,:,:])
    @time mindists_sq2(pos, dists_min, Acp, tmp, ntmp)
end
test(A, B)

elapsed time: 0.004768065 seconds (3624960 bytes allocated)

but it's a bit more verbose and convoluted.

Reply via email to