The performance of the sort algorithm varies largely with the element type 
of an Array, in an unexpected (at least for me) way. Sorting time is 
ordered like this: Vector{(Any,Any)} < Vector{Any} < 
Vector{(Float64,Int64}). Is this a bug or is this behavior intended? Here's 
the code:

---

julia> function sort_floatint_tuple()
         temp = rand(500_000)
         temp2 = (Float64,Int64)[(temp[i],i) for i=1:length(temp)]
         sort!(temp2)
       end
sort_floatint_tuple (generic function with 1 method)

julia> gc(); @time sort_floatint_tuple();
elapsed time: 5.371570706 seconds (2187115768 bytes allocated, 44.77% gc 
time)

julia> gc(); @time sort_floatint_tuple();
elapsed time: 7.522759215 seconds (2184593304 bytes allocated, 57.48% gc 
time)

julia> function sort_any_tuple()
         temp = rand(500_000)
         temp2 = (Any,Any)[(temp[i],i) for i=1:length(temp)]
         sort!(temp2)
       end
sort_any_tuple (generic function with 1 method)

julia> gc(); @time sort_any_tuple();
elapsed time: 2.705974449 seconds (526135400 bytes allocated, 23.32% gc 
time)

julia> gc(); @time sort_any_tuple();
elapsed time: 3.215082563 seconds (523241560 bytes allocated, 37.72% gc 
time)

julia> function sort_any()
         temp = rand(500_000)
         temp2 = Any[(temp[i],i) for i=1:length(temp)]
         sort!(temp2)
       end
sort_any (generic function with 1 method)

julia> gc(); @time sort_any();
elapsed time: 3.591829528 seconds (231327824 bytes allocated, 6.01% gc time)

julia> gc(); @time sort_any();
elapsed time: 3.932805966 seconds (231203560 bytes allocated, 12.54% gc 
time)

---


Python is much faster here:
---
%timeit  temp=np.random.rand(500000); temp2=zip(temp,range(len(temp))); 
temp2.sort()
1 loops, best of 3: 1.33 s per loop

---

PS: Type inference gives Vector{(Float64,Int64)} if I write the for 
comprehension in a function and Vector{(Any,Any)} if I run it outside of a 
function.

PPS: After looking at this in detail, I found out that "sortperm" would 
have been a better option for this. But it is still surprising to me that 
losing type information ((Any,Any) instead of (Float64,Int64)) speeds up 
the code.

Reply via email to