It seems weird to me that you guys want to call Jaccard distance with float 
arrays. AFAIK Jaccard distance measures the distance between two distinct 
samples from a pair of sets so basically between two Vector{Bool}, 
see: 
http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.jaccard.html

"Computes the Jaccard-Needham dissimilarity between two boolean 1-D arrays."

Is there some more general formulation of it that extends to vectors in a 
continuous vector space?

And, to note, Jaccard is type stable for inputs of Vector{Bool} in 
Distances.jl.

On Monday, June 13, 2016 at 3:53:14 AM UTC+2, jean-pierre both wrote:
>
>
>
> I encountered in my application with Distances.Jaccard compared with 
> Distances.Euclidean
> It was very slow.
>
> For example with 2 vecteurs Float64 of size 11520
>
> I get the following 
> julia> D=Euclidean()
> Distances.Euclidean()
> julia> @time for i in 1:500
>        evaluate(D,v1,v2)
>        end
>   0.002553 seconds (500 allocations: 7.813 KB)
>
> and with Jaccard
>
> julia> D=Jaccard()
> Distances.Jaccard()
> @time for i in 1:500
>               evaluate(D,v1,v2)
>               end
>   1.995046 seconds (40.32 M allocations: 703.156 MB, 9.68% gc time)
>
> With a simple loop for computing jaccard :
>
>
> function myjaccard2(a::Array{Float64,1}, b::Array{Float64,1})
>            num = 0
>            den = 0
>            for i in 1:length(a)
>                    num = num + min(a[i],b[i])
>                    den = den + max(a[i],b[i])      
>            end
>                1. - num/den
>        end
> myjaccard2 (generic function with 1 method)
>
> julia> @time for i in 1:500
>               myjaccard2(v1,v2)
>               end
>   0.451582 seconds (23.04 M allocations: 351.592 MB, 20.04% gc time)
>
> I do not see the problem in jaccard distance implementation in the 
> Distances packages
>

Reply via email to