It seems weird to me that you guys want to call Jaccard distance with float arrays. AFAIK Jaccard distance measures the distance between two distinct samples from a pair of sets so basically between two Vector{Bool}, see: http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.jaccard.html
"Computes the Jaccard-Needham dissimilarity between two boolean 1-D arrays." Is there some more general formulation of it that extends to vectors in a continuous vector space? And, to note, Jaccard is type stable for inputs of Vector{Bool} in Distances.jl. On Monday, June 13, 2016 at 3:53:14 AM UTC+2, jean-pierre both wrote: > > > > I encountered in my application with Distances.Jaccard compared with > Distances.Euclidean > It was very slow. > > For example with 2 vecteurs Float64 of size 11520 > > I get the following > julia> D=Euclidean() > Distances.Euclidean() > julia> @time for i in 1:500 > evaluate(D,v1,v2) > end > 0.002553 seconds (500 allocations: 7.813 KB) > > and with Jaccard > > julia> D=Jaccard() > Distances.Jaccard() > @time for i in 1:500 > evaluate(D,v1,v2) > end > 1.995046 seconds (40.32 M allocations: 703.156 MB, 9.68% gc time) > > With a simple loop for computing jaccard : > > > function myjaccard2(a::Array{Float64,1}, b::Array{Float64,1}) > num = 0 > den = 0 > for i in 1:length(a) > num = num + min(a[i],b[i]) > den = den + max(a[i],b[i]) > end > 1. - num/den > end > myjaccard2 (generic function with 1 method) > > julia> @time for i in 1:500 > myjaccard2(v1,v2) > end > 0.451582 seconds (23.04 M allocations: 351.592 MB, 20.04% gc time) > > I do not see the problem in jaccard distance implementation in the > Distances packages >