> function myjaccard2(a::Array{Float64,1}, b::Array{Float64,1})
>     num = 0.
>     den = 0.
>     for I in 1:length(a)
>         @inbounds ai = a[I]
>         @inbounds bi = b[I]
>         num = num + min(ai,bi)
>         den = den + max(ai,bi)
>     end
>     1. - num/den
> end
>
>
>
> function testDistances2(v1::Array{Float64,1}, v2::Array{Float64,1})
>     for i in 1:50000
>         myjaccard2(v1,v2)
>     end
> end

I recommend using the values returned for something, otherwise the
compiler sometimes eliminates the loop (but not here):

julia> function testDistances2(v1::Array{Float64,1}, v2::Array{Float64,1})
           out = 0.0
           for i in 1:50000
               out += myjaccard2(v1,v2)
           end
           out
       end

> @time testDistances2(v1,v2)
> machine   3.217329 seconds (200.01 M allocations: 2.981 GB, 19.91% gc time)

I cannot reproduce this, when I run it I get no allocations:

julia> v2 = rand(10^4);

# warm-up
julia> @time testDistances2(v1,v2)
  3.604478 seconds (8.15 k allocations: 401.797 KB, 0.42% gc time)
24999.00112162811

julia> @time testDistances2(v1,v2)
  3.647563 seconds (5 allocations: 176 bytes)
24999.00112162811

What version of Julia are you running. Me 0.4.5.

> function myjaccard5(a::Array{Float64,1}, b::Array{Float64,1})
>     num = 0.
>     den = 0.
>     for I in 1:length(a)
>         @inbounds ai = a[I]
>         @inbounds bi = b[I]
>         abs_m = abs(ai-bi)
>         abs_p = abs(ai+bi)
>         num += abs_p - abs_m
>         den += abs_p + abs_m
>     end
>     1. - num/den
> end
>
>
> function testDistances5(a::Array{Float64,1}, b::Array{Float64,1})
>     for i in 1:5000
>         myjaccard5(a,b)
>     end
> end
>
> end
>
>
> julia> @time testDistances5(v1,v2)
>   0.166979 seconds (4 allocations: 160 bytes)
>
>
>
> We see that using abs is faster.
>
> I do not do a pull request beccause
>
> I would expect a good implementation to be 2 or 3 times slower than
> Euclidean, and I have not
> that yet.
>
> Le lundi 13 juin 2016 13:43:00 UTC+2, Kristoffer Carlsson a écrit :
>>
>> It seems weird to me that you guys want to call Jaccard distance with
>> float arrays. AFAIK Jaccard distance measures the distance between two
>> distinct samples from a pair of sets so basically between two Vector{Bool},
>> see:
>> http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.jaccard.html
>>
>> "Computes the Jaccard-Needham dissimilarity between two boolean 1-D
>> arrays."
>>
>> Is there some more general formulation of it that extends to vectors in a
>> continuous vector space?
>>
>> And, to note, Jaccard is type stable for inputs of Vector{Bool} in
>> Distances.jl.
>>
>> On Monday, June 13, 2016 at 3:53:14 AM UTC+2, jean-pierre both wrote:
>>>
>>>
>>>
>>> I encountered in my application with Distances.Jaccard compared with
>>> Distances.Euclidean
>>> It was very slow.
>>>
>>> For example with 2 vecteurs Float64 of size 11520
>>>
>>> I get the following
>>> julia> D=Euclidean()
>>> Distances.Euclidean()
>>> julia> @time for i in 1:500
>>>        evaluate(D,v1,v2)
>>>        end
>>>   0.002553 seconds (500 allocations: 7.813 KB)
>>>
>>> and with Jaccard
>>>
>>> julia> D=Jaccard()
>>> Distances.Jaccard()
>>> @time for i in 1:500
>>>               evaluate(D,v1,v2)
>>>               end
>>>   1.995046 seconds (40.32 M allocations: 703.156 MB, 9.68% gc time)
>>>
>>> With a simple loop for computing jaccard :
>>>
>>>
>>> function myjaccard2(a::Array{Float64,1}, b::Array{Float64,1})
>>>            num = 0
>>>            den = 0
>>>            for i in 1:length(a)
>>>                    num = num + min(a[i],b[i])
>>>                    den = den + max(a[i],b[i])
>>>            end
>>>                1. - num/den
>>>        end
>>> myjaccard2 (generic function with 1 method)
>>>
>>> julia> @time for i in 1:500
>>>               myjaccard2(v1,v2)
>>>               end
>>>   0.451582 seconds (23.04 M allocations: 351.592 MB, 20.04% gc time)
>>>
>>> I do not see the problem in jaccard distance implementation in the
>>> Distances packages
>>>
>>

Reply via email to