To make it fast, you probably want
type Pair{X,Y}
x::X
y::Y
end
and use a concretely-typed array of Pair{X,Y}.
--Tim
On Friday, January 09, 2015 04:23:39 PM Páll Haraldsson wrote:
> I'm not sure why it matters if I added Numbers:
> type Pair <: Number
> x
> y
> end
>
> I got a little lower time and 0 bytes allocated:
>
> julia> @time sort_Pair()
> elapsed time: 0.006614108 seconds (4000088 bytes allocated)
> elapsed time: 0.041584405 seconds (31991872 bytes allocated)
> elapsed time: 1.732269097 seconds (0 bytes allocated)
> Array{Pair,1}
> elapsed time: 1.951577918 seconds (35993216 bytes allocated, 8.71% gc time)
>
> > Hi,
> >
> > This is my first answer here, I'm not an expert on Julia.. Seems tuple is
> > slow. It's more general,
> >
> > On my machine this beats Python:
> >
> > function sort_Pair()
> >
> > gc(); @time temp = rand(500_000)
> > gc(); @time temp2 = [Pair(temp[i],i) for i=1:length(temp)]
> > gc(); @time sort!(temp2); println(typeof(temp2))
> >
> > end
> >
> > julia> @time sort_pair()
> > elapsed time: 0.006823427 seconds (4000088 bytes allocated)
> > elapsed time: 0.035921609 seconds (31991872 bytes allocated)
> > elapsed time: 1.975542283 seconds (4000048 bytes allocated)
> > Array{Pair,1}
> > elapsed time: 2.20708157 seconds (40892612 bytes allocated, 7.46% gc time)
> >
> > compared to (or 2.7 sec total in Python):
> >
> > julia> @time sort_any()
> > elapsed time: 0.006287604 seconds (4000088 bytes allocated)
> > elapsed time: 0.047933539 seconds (35991872 bytes allocated)
> > elapsed time: 5.989875809 seconds (191154832 bytes allocated)
> > Array{Any,1}
> > elapsed time: 6.247331383 seconds (231803748 bytes allocated, 2.76% gc
> > time)
> >
> >
> > I used:
> >
> > type Pair
> >
> > x
> > y
> >
> > end
> >
> > import Base.isless
> >
> > function isless(a, b)
> >
> > if a.x < b.x
> >
> > true
> >
> > elseif a.x == b.x && a.y < b.y
> >
> > true
> >
> > else
> >
> > false
> >
> > end
> >
> > end
> >
> > You can maybe think of a tuple as a type, but I'm not sure where to look
> > for code for it nor looked into the profiler (use @profile, not @time). A
> > tuple can of course be a 2-tupel (pair), 3-tuple, etc. And I could change
> > the function for each case, just not sure how I would make my own type
> > that
> > would handle n-tuples.. Julia does that and maybe just has to be
> > optimized.
> > Maybe in 0.4 it is..
> >
> > Types in Julia are supposed to be abstraction-free, but included tuples
> > seem to have a 231803748/40892612 = 5.6 times overhead compared to my Pair
> > judging by the memory allocations.
> >
> > On Friday, January 9, 2015 at 10:06:30 AM UTC, Andras Niedermayer wrote:
> >> The performance of the sort algorithm varies largely with the element
> >> type of an Array, in an unexpected (at least for me) way. Sorting time is
> >> ordered like this: Vector{(Any,Any)} < Vector{Any} <
> >> Vector{(Float64,Int64}). Is this a bug or is this behavior intended?
> >> Here's
> >> the code:
> >>
> >> ---
> >>
> >> julia> function sort_floatint_tuple()
> >>
> >> temp = rand(500_000)
> >> temp2 = (Float64,Int64)[(temp[i],i) for i=1:length(temp)]
> >> sort!(temp2)
> >>
> >> end
> >>
> >> sort_floatint_tuple (generic function with 1 method)
> >>
> >> julia> gc(); @time sort_floatint_tuple();
> >> elapsed time: 5.371570706 seconds (2187115768 bytes allocated, 44.77% gc
> >> time)
> >>
> >> julia> gc(); @time sort_floatint_tuple();
> >> elapsed time: 7.522759215 seconds (2184593304 bytes allocated, 57.48% gc
> >> time)
> >>
> >> julia> function sort_any_tuple()
> >>
> >> temp = rand(500_000)
> >> temp2 = (Any,Any)[(temp[i],i) for i=1:length(temp)]
> >> sort!(temp2)
> >>
> >> end
> >>
> >> sort_any_tuple (generic function with 1 method)
> >>
> >> julia> gc(); @time sort_any_tuple();
> >> elapsed time: 2.705974449 seconds (526135400 bytes allocated, 23.32% gc
> >> time)
> >>
> >> julia> gc(); @time sort_any_tuple();
> >> elapsed time: 3.215082563 seconds (523241560 bytes allocated, 37.72% gc
> >> time)
> >>
> >> julia> function sort_any()
> >>
> >> temp = rand(500_000)
> >> temp2 = Any[(temp[i],i) for i=1:length(temp)]
> >> sort!(temp2)
> >>
> >> end
> >>
> >> sort_any (generic function with 1 method)
> >>
> >> julia> gc(); @time sort_any();
> >> elapsed time: 3.591829528 seconds (231327824 bytes allocated, 6.01% gc
> >> time)
> >>
> >> julia> gc(); @time sort_any();
> >> elapsed time: 3.932805966 seconds (231203560 bytes allocated, 12.54% gc
> >> time)
> >>
> >> ---
> >>
> >>
> >> Python is much faster here:
> >> ---
> >> %timeit temp=np.random.rand(500000); temp2=zip(temp,range(len(temp)));
> >> temp2.sort()
> >> 1 loops, best of 3: 1.33 s per loop
> >>
> >> ---
> >>
> >> PS: Type inference gives Vector{(Float64,Int64)} if I write the for
> >> comprehension in a function and Vector{(Any,Any)} if I run it outside of
> >> a
> >> function.
> >>
> >> PPS: After looking at this in detail, I found out that "sortperm" would
> >> have been a better option for this. But it is still surprising to me that
> >> losing type information ((Any,Any) instead of (Float64,Int64)) speeds up
> >> the code.