For arrays of numbers, it's assumed that an unstable sort is ok, so
quicksort is used and no temporary array needs to be allocated. For other
kinds of objects, we default to mergesort, which needs a temporary array.

On Fri, Jan 9, 2015 at 7:23 PM, Páll Haraldsson <[email protected]>
wrote:

>
> I'm not sure why it matters if I added Numbers:
> type Pair <: Number
>          x
>          y
>        end
>
> I got a little lower time and 0 bytes allocated:
>
> julia> @time sort_Pair()
> elapsed time: 0.006614108 seconds (4000088 bytes allocated)
> elapsed time: 0.041584405 seconds (31991872 bytes allocated)
> elapsed time: 1.732269097 seconds (0 bytes allocated)
> Array{Pair,1}
> elapsed time: 1.951577918 seconds (35993216 bytes allocated, 8.71% gc time)
>
> --
> Palli.
>
>
> On Saturday, January 10, 2015 at 12:05:03 AM UTC, Páll Haraldsson wrote:
>>
>>
>> Hi,
>>
>> This is my first answer here, I'm not an expert on Julia.. Seems tuple is
>> slow. It's more general,
>>
>> On my machine this beats Python:
>>
>> function sort_Pair()
>>   gc(); @time temp = rand(500_000)
>>   gc(); @time temp2 = [Pair(temp[i],i) for i=1:length(temp)]
>>   gc(); @time sort!(temp2); println(typeof(temp2))
>> end
>>
>> julia> @time sort_pair()
>> elapsed time: 0.006823427 seconds (4000088 bytes allocated)
>> elapsed time: 0.035921609 seconds (31991872 bytes allocated)
>> elapsed time: 1.975542283 seconds (4000048 bytes allocated)
>> Array{Pair,1}
>> elapsed time: 2.20708157 seconds (40892612 bytes allocated, 7.46% gc time)
>>
>> compared to (or 2.7 sec total in Python):
>>
>> julia> @time sort_any()
>> elapsed time: 0.006287604 seconds (4000088 bytes allocated)
>> elapsed time: 0.047933539 seconds (35991872 bytes allocated)
>> elapsed time: 5.989875809 seconds (191154832 bytes allocated)
>> Array{Any,1}
>> elapsed time: 6.247331383 seconds (231803748 bytes allocated, 2.76% gc
>> time)
>>
>>
>> I used:
>>
>> type Pair
>>   x
>>   y
>> end
>>
>> import Base.isless
>>
>> function isless(a, b)
>>   if a.x < b.x
>>     true
>>   elseif a.x == b.x && a.y < b.y
>>     true
>>   else
>>     false
>>   end
>> end
>>
>> You can maybe think of a tuple as a type, but I'm not sure where to look
>> for code for it nor looked into the profiler (use @profile, not @time). A
>> tuple can of course be a 2-tupel (pair), 3-tuple, etc. And I could change
>> the function for each case, just not sure how I would make my own type that
>> would handle n-tuples.. Julia does that and maybe just has to be optimized.
>> Maybe in 0.4 it is..
>>
>> Types in Julia are supposed to be abstraction-free, but included tuples
>> seem to have a 231803748/40892612 = 5.6 times overhead compared to my Pair
>> judging by the memory allocations.
>>
>> --
>> Palli.
>>
>> On Friday, January 9, 2015 at 10:06:30 AM UTC, Andras Niedermayer wrote:
>>>
>>>
>>>
>>> The performance of the sort algorithm varies largely with the element
>>> type of an Array, in an unexpected (at least for me) way. Sorting time is
>>> ordered like this: Vector{(Any,Any)} < Vector{Any} <
>>> Vector{(Float64,Int64}). Is this a bug or is this behavior intended? Here's
>>> the code:
>>>
>>> ---
>>>
>>> julia> function sort_floatint_tuple()
>>>          temp = rand(500_000)
>>>          temp2 = (Float64,Int64)[(temp[i],i) for i=1:length(temp)]
>>>          sort!(temp2)
>>>        end
>>> sort_floatint_tuple (generic function with 1 method)
>>>
>>> julia> gc(); @time sort_floatint_tuple();
>>> elapsed time: 5.371570706 seconds (2187115768 bytes allocated, 44.77%
>>> gc time)
>>>
>>> julia> gc(); @time sort_floatint_tuple();
>>> elapsed time: 7.522759215 seconds (2184593304 bytes allocated, 57.48%
>>> gc time)
>>>
>>> julia> function sort_any_tuple()
>>>          temp = rand(500_000)
>>>          temp2 = (Any,Any)[(temp[i],i) for i=1:length(temp)]
>>>          sort!(temp2)
>>>        end
>>> sort_any_tuple (generic function with 1 method)
>>>
>>> julia> gc(); @time sort_any_tuple();
>>> elapsed time: 2.705974449 seconds (526135400 bytes allocated, 23.32% gc
>>> time)
>>>
>>> julia> gc(); @time sort_any_tuple();
>>> elapsed time: 3.215082563 seconds (523241560 bytes allocated, 37.72% gc
>>> time)
>>>
>>> julia> function sort_any()
>>>          temp = rand(500_000)
>>>          temp2 = Any[(temp[i],i) for i=1:length(temp)]
>>>          sort!(temp2)
>>>        end
>>> sort_any (generic function with 1 method)
>>>
>>> julia> gc(); @time sort_any();
>>> elapsed time: 3.591829528 seconds (231327824 bytes allocated, 6.01% gc
>>> time)
>>>
>>> julia> gc(); @time sort_any();
>>> elapsed time: 3.932805966 seconds (231203560 bytes allocated, 12.54% gc
>>> time)
>>>
>>> ---
>>>
>>>
>>> Python is much faster here:
>>> ---
>>> %timeit  temp=np.random.rand(500000); temp2=zip(temp,range(len(temp)));
>>> temp2.sort()
>>> 1 loops, best of 3: 1.33 s per loop
>>>
>>> ---
>>>
>>> PS: Type inference gives Vector{(Float64,Int64)} if I write the for
>>> comprehension in a function and Vector{(Any,Any)} if I run it outside of a
>>> function.
>>>
>>> PPS: After looking at this in detail, I found out that "sortperm" would
>>> have been a better option for this. But it is still surprising to me that
>>> losing type information ((Any,Any) instead of (Float64,Int64)) speeds up
>>> the code.
>>>
>>

Reply via email to