The code in Distances.jl is quite heavily optimized and uses BLAS calls 
when possible (which it is for Euclidean metric). Your code has many 
allocations like x = x' and norm(x[:,i] - x[:,j]).

On Wednesday, September 7, 2016 at 1:43:11 PM UTC+2, Weicheng Zhu wrote:
>
> Hi there,
> I write a function to calculate the distance for each row of a two 
> dimensional array and I compared it with the `pairwise` function in the 
> Distance module.
> Does anyone can help me to find out the reason why my function is slower 
> than the pairwise function? I only keep the triangle elements of the 
> distance matrix which I thought should be faster. Thanks in advance for any 
> help:)
>
> Here is the code:
>
> Module Tmp
>
> import DataFrames: DataFrame
>
> function dist(x::Matrix)
>
>     x = x'
>
>     n = size(x, 2)
>
>     ij::UInt = 0
>
>     d = zeros(convert(Int, (n-1)*n/2))
>
>     for i in 1:n
>
>         for j in (i+1):n
>
>             ij += 1
>
>             d[ij] = norm(x[:,i] - x[:,j])
>
>         end
>
>     end
>
>     return d
>
> end
>
>
> function dist(x::DataFrame)
>
>     dist(convert(Array, dat))
>
> end
>
> export dist
>
> end
>
>
> using Tmp
>
> using Distances
>
> x = rand(100,2)
>
> @time dist(x)
>
> # 0.001581 seconds (29.71 k allocations: 1.399 MB)
>
> @time pairwise(Euclidean(), x')
>
> # 0.000318 seconds (310 allocations: 91.984 KB)
>
>
>                                                                           
>                                                                             
>                                                                             
>                                                                             
>                            
>

Reply via email to