[go-nuts] Re: Sort a huge slice of data around 2GB

Slawomir Pryczek Thu, 30 Nov 2017 06:23:46 -0800

It should be very simple if you have additional 2G of memory. You divide 
the data to X parts where X is power of 2 and X needs to be less than 
number of cores available. Eg. for 2000MB it can be 250x8. Then you sort it 
in paralell using built-in sorting function and at the end you just 
concatenate the arrays by scanning them and picking smallest element as you 
move forward. You turn 250x8 => 500x4 => 1000x2 => 2000.


It should take about 15-20 minutes to write, and the sorting would probably 
take 5-6 minutes instead of half hour for 8 threads.

In the concat phase you can also paralellize the process more, by getting 
the average of middle elements, then finding position where element is less 
than this value and just breaking the array, at this point into 2 sepatate 
arrays. Eg. if you're doing the last step and you see the element in Arr1 
at position 500000 is 91821 and at the same position in Arr2 you have 
1782713 => average is 937267 you're using binary search to find position of 
937267 or anything that is closest in both arrays - then you can break the 
arrays to Arr11 +Arr12 / Arr21 + Arr22 and you just concat Arr11 c Arr21 
and Arr21 c Arr22. But that probably would take more time and is not 
necessairly worth the effor.



W dniu środa, 29 listopada 2017 15:19:13 UTC+1 użytkownik Subramanian K 
napisał:
>
> Hi
>
> I am using native sort provided part of the package and to process 48MB of 
> slice data, it takes ~45sec.
> To run 2GB of data it takes really long time, I am trying to split these 
> to buckets and make it run concurrently, finally need to collate results of 
> all these small sorted buckets.
>
> Do we have any sort package which can sort huge data swiftly?
>
> Regards,
> Subu. K
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[go-nuts] Re: Sort a huge slice of data around 2GB

Reply via email to