Thanks for all the answer, they were really helpful.
After going through a number of variations of how to define a pair, I found
that the fastest solution involves immutable parametric types that are a
subtype of Number. This brings computational time down to 0.09 seconds from
6 seconds for tuples, which is faster than Python (1.3 seconds),
see https://gist.github.com/afniedermayer/f9cea59ceba24858ac0a
It's still not entirely satisfactory that sorting arrays of tuples is so
much slower in Julia than in Python, especially since I'm used to working
with tuples all the time in Python. Maybe the lesson to be learned is that
I should give up the habit of using tuples, but tuples are more convenient
than defining my own Pair/Triple/Quadruple/etc types and overloading the
Base.isless function for them.
@StefanKarpinski Stable vs unstable sort explains the performance
difference between `type Pair` and `type Pair <: Number`, but it would
suggest the opposite of what one sees for sorting `Vector{(Any,Any)}`
(faster) vs `Vector{(Float64,Int64)}` (slower) (see
https://gist.github.com/afniedermayer/f9cea59ceba24858ac0a#file-sorting_tuples-jl
).
It's strange that the presence of type information makes the sorting of
tuples slower by a factor 3. Maybe there's some problem with the
implementation of sort?
On Saturday, January 10, 2015 at 12:01:18 PM UTC+1, Tim Holy wrote:
>
> To make it fast, you probably want
>
> type Pair{X,Y}
> x::X
> y::Y
> end
>
> and use a concretely-typed array of Pair{X,Y}.
>
> --Tim
>
> On Friday, January 09, 2015 04:23:39 PM Páll Haraldsson wrote:
> > I'm not sure why it matters if I added Numbers:
> > type Pair <: Number
> > x
> > y
> > end
> >
> > I got a little lower time and 0 bytes allocated:
> >
> > julia> @time sort_Pair()
> > elapsed time: 0.006614108 seconds (4000088 bytes allocated)
> > elapsed time: 0.041584405 seconds (31991872 bytes allocated)
> > elapsed time: 1.732269097 seconds (0 bytes allocated)
> > Array{Pair,1}
> > elapsed time: 1.951577918 seconds (35993216 bytes allocated, 8.71% gc
> time)
> >
> > > Hi,
> > >
> > > This is my first answer here, I'm not an expert on Julia.. Seems tuple
> is
> > > slow. It's more general,
> > >
> > > On my machine this beats Python:
> > >
> > > function sort_Pair()
> > >
> > > gc(); @time temp = rand(500_000)
> > > gc(); @time temp2 = [Pair(temp[i],i) for i=1:length(temp)]
> > > gc(); @time sort!(temp2); println(typeof(temp2))
> > >
> > > end
> > >
> > > julia> @time sort_pair()
> > > elapsed time: 0.006823427 seconds (4000088 bytes allocated)
> > > elapsed time: 0.035921609 seconds (31991872 bytes allocated)
> > > elapsed time: 1.975542283 seconds (4000048 bytes allocated)
> > > Array{Pair,1}
> > > elapsed time: 2.20708157 seconds (40892612 bytes allocated, 7.46% gc
> time)
> > >
> > > compared to (or 2.7 sec total in Python):
> > >
> > > julia> @time sort_any()
> > > elapsed time: 0.006287604 seconds (4000088 bytes allocated)
> > > elapsed time: 0.047933539 seconds (35991872 bytes allocated)
> > > elapsed time: 5.989875809 seconds (191154832 bytes allocated)
> > > Array{Any,1}
> > > elapsed time: 6.247331383 seconds (231803748 bytes allocated, 2.76% gc
> > > time)
> > >
> > >
> > > I used:
> > >
> > > type Pair
> > >
> > > x
> > > y
> > >
> > > end
> > >
> > > import Base.isless
> > >
> > > function isless(a, b)
> > >
> > > if a.x < b.x
> > >
> > > true
> > >
> > > elseif a.x == b.x && a.y < b.y
> > >
> > > true
> > >
> > > else
> > >
> > > false
> > >
> > > end
> > >
> > > end
> > >
> > > You can maybe think of a tuple as a type, but I'm not sure where to
> look
> > > for code for it nor looked into the profiler (use @profile, not
> @time). A
> > > tuple can of course be a 2-tupel (pair), 3-tuple, etc. And I could
> change
> > > the function for each case, just not sure how I would make my own type
> > > that
> > > would handle n-tuples.. Julia does that and maybe just has to be
> > > optimized.
> > > Maybe in 0.4 it is..
> > >
> > > Types in Julia are supposed to be abstraction-free, but included
> tuples
> > > seem to have a 231803748/40892612 = 5.6 times overhead compared to my
> Pair
> > > judging by the memory allocations.
> > >
> > > On Friday, January 9, 2015 at 10:06:30 AM UTC, Andras Niedermayer
> wrote:
> > >> The performance of the sort algorithm varies largely with the element
> > >> type of an Array, in an unexpected (at least for me) way. Sorting
> time is
> > >> ordered like this: Vector{(Any,Any)} < Vector{Any} <
> > >> Vector{(Float64,Int64}). Is this a bug or is this behavior intended?
> > >> Here's
> > >> the code:
> > >>
> > >> ---
> > >>
> > >> julia> function sort_floatint_tuple()
> > >>
> > >> temp = rand(500_000)
> > >> temp2 = (Float64,Int64)[(temp[i],i) for i=1:length(temp)]
> > >> sort!(temp2)
> > >>
> > >> end
> > >>
> > >> sort_floatint_tuple (generic function with 1 method)
> > >>
> > >> julia> gc(); @time sort_floatint_tuple();
> > >> elapsed time: 5.371570706 seconds (2187115768 bytes allocated, 44.77%
> gc
> > >> time)
> > >>
> > >> julia> gc(); @time sort_floatint_tuple();
> > >> elapsed time: 7.522759215 seconds (2184593304 bytes allocated, 57.48%
> gc
> > >> time)
> > >>
> > >> julia> function sort_any_tuple()
> > >>
> > >> temp = rand(500_000)
> > >> temp2 = (Any,Any)[(temp[i],i) for i=1:length(temp)]
> > >> sort!(temp2)
> > >>
> > >> end
> > >>
> > >> sort_any_tuple (generic function with 1 method)
> > >>
> > >> julia> gc(); @time sort_any_tuple();
> > >> elapsed time: 2.705974449 seconds (526135400 bytes allocated, 23.32%
> gc
> > >> time)
> > >>
> > >> julia> gc(); @time sort_any_tuple();
> > >> elapsed time: 3.215082563 seconds (523241560 bytes allocated, 37.72%
> gc
> > >> time)
> > >>
> > >> julia> function sort_any()
> > >>
> > >> temp = rand(500_000)
> > >> temp2 = Any[(temp[i],i) for i=1:length(temp)]
> > >> sort!(temp2)
> > >>
> > >> end
> > >>
> > >> sort_any (generic function with 1 method)
> > >>
> > >> julia> gc(); @time sort_any();
> > >> elapsed time: 3.591829528 seconds (231327824 bytes allocated, 6.01%
> gc
> > >> time)
> > >>
> > >> julia> gc(); @time sort_any();
> > >> elapsed time: 3.932805966 seconds (231203560 bytes allocated, 12.54%
> gc
> > >> time)
> > >>
> > >> ---
> > >>
> > >>
> > >> Python is much faster here:
> > >> ---
> > >> %timeit temp=np.random.rand(500000);
> temp2=zip(temp,range(len(temp)));
> > >> temp2.sort()
> > >> 1 loops, best of 3: 1.33 s per loop
> > >>
> > >> ---
> > >>
> > >> PS: Type inference gives Vector{(Float64,Int64)} if I write the for
> > >> comprehension in a function and Vector{(Any,Any)} if I run it outside
> of
> > >> a
> > >> function.
> > >>
> > >> PPS: After looking at this in detail, I found out that "sortperm"
> would
> > >> have been a better option for this. But it is still surprising to me
> that
> > >> losing type information ((Any,Any) instead of (Float64,Int64)) speeds
> up
> > >> the code.
>
>