Please try https://github.com/JuliaStats/Distances.jl/pull/44
On Monday, June 13, 2016 at 8:14:01 PM UTC+2, Mauro wrote: > > > function myjaccard2(a::Array{Float64,1}, b::Array{Float64,1}) > > num = 0. > > den = 0. > > for I in 1:length(a) > > @inbounds ai = a[I] > > @inbounds bi = b[I] > > num = num + min(ai,bi) > > den = den + max(ai,bi) > > end > > 1. - num/den > > end > > > > > > > > function testDistances2(v1::Array{Float64,1}, v2::Array{Float64,1}) > > for i in 1:50000 > > myjaccard2(v1,v2) > > end > > end > > I recommend using the values returned for something, otherwise the > compiler sometimes eliminates the loop (but not here): > > julia> function testDistances2(v1::Array{Float64,1}, v2::Array{Float64,1}) > out = 0.0 > for i in 1:50000 > out += myjaccard2(v1,v2) > end > out > end > > > @time testDistances2(v1,v2) > > machine 3.217329 seconds (200.01 M allocations: 2.981 GB, 19.91% gc > time) > > I cannot reproduce this, when I run it I get no allocations: > > julia> v2 = rand(10^4); > > # warm-up > julia> @time testDistances2(v1,v2) > 3.604478 seconds (8.15 k allocations: 401.797 KB, 0.42% gc time) > 24999.00112162811 > > julia> @time testDistances2(v1,v2) > 3.647563 seconds (5 allocations: 176 bytes) > 24999.00112162811 > > What version of Julia are you running. Me 0.4.5. > > > function myjaccard5(a::Array{Float64,1}, b::Array{Float64,1}) > > num = 0. > > den = 0. > > for I in 1:length(a) > > @inbounds ai = a[I] > > @inbounds bi = b[I] > > abs_m = abs(ai-bi) > > abs_p = abs(ai+bi) > > num += abs_p - abs_m > > den += abs_p + abs_m > > end > > 1. - num/den > > end > > > > > > function testDistances5(a::Array{Float64,1}, b::Array{Float64,1}) > > for i in 1:5000 > > myjaccard5(a,b) > > end > > end > > > > end > > > > > > julia> @time testDistances5(v1,v2) > > 0.166979 seconds (4 allocations: 160 bytes) > > > > > > > > We see that using abs is faster. > > > > I do not do a pull request beccause > > > > I would expect a good implementation to be 2 or 3 times slower than > > Euclidean, and I have not > > that yet. > > > > Le lundi 13 juin 2016 13:43:00 UTC+2, Kristoffer Carlsson a écrit : > >> > >> It seems weird to me that you guys want to call Jaccard distance with > >> float arrays. AFAIK Jaccard distance measures the distance between two > >> distinct samples from a pair of sets so basically between two > Vector{Bool}, > >> see: > >> > http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.jaccard.html > > >> > >> "Computes the Jaccard-Needham dissimilarity between two boolean 1-D > >> arrays." > >> > >> Is there some more general formulation of it that extends to vectors in > a > >> continuous vector space? > >> > >> And, to note, Jaccard is type stable for inputs of Vector{Bool} in > >> Distances.jl. > >> > >> On Monday, June 13, 2016 at 3:53:14 AM UTC+2, jean-pierre both wrote: > >>> > >>> > >>> > >>> I encountered in my application with Distances.Jaccard compared with > >>> Distances.Euclidean > >>> It was very slow. > >>> > >>> For example with 2 vecteurs Float64 of size 11520 > >>> > >>> I get the following > >>> julia> D=Euclidean() > >>> Distances.Euclidean() > >>> julia> @time for i in 1:500 > >>> evaluate(D,v1,v2) > >>> end > >>> 0.002553 seconds (500 allocations: 7.813 KB) > >>> > >>> and with Jaccard > >>> > >>> julia> D=Jaccard() > >>> Distances.Jaccard() > >>> @time for i in 1:500 > >>> evaluate(D,v1,v2) > >>> end > >>> 1.995046 seconds (40.32 M allocations: 703.156 MB, 9.68% gc time) > >>> > >>> With a simple loop for computing jaccard : > >>> > >>> > >>> function myjaccard2(a::Array{Float64,1}, b::Array{Float64,1}) > >>> num = 0 > >>> den = 0 > >>> for i in 1:length(a) > >>> num = num + min(a[i],b[i]) > >>> den = den + max(a[i],b[i]) > >>> end > >>> 1. - num/den > >>> end > >>> myjaccard2 (generic function with 1 method) > >>> > >>> julia> @time for i in 1:500 > >>> myjaccard2(v1,v2) > >>> end > >>> 0.451582 seconds (23.04 M allocations: 351.592 MB, 20.04% gc time) > >>> > >>> I do not see the problem in jaccard distance implementation in the > >>> Distances packages > >>> > >> >