[go-nuts] Re: Sort a huge slice of data around 2GB

Subramanian K Thu, 30 Nov 2017 08:38:15 -0800

Thanks all, did some metrics on comparison function "Less", had JSON data 
in this huge file, had to unmarshall JSON data and then do a comparison 
based on obtained data. unmarshalling took 98% of time in this comparison.
Looking to avoid unmarshaliing and finding an alternative way to do this.


Thanks all for your time and suggestions.

Regards,
Subramanian. K

On Thursday, 30 November 2017 19:53:01 UTC+5:30, Slawomir Pryczek wrote:
>
> It should be very simple if you have additional 2G of memory. You divide 
> the data to X parts where X is power of 2 and X needs to be less than 
> number of cores available. Eg. for 2000MB it can be 250x8. Then you sort it 
> in paralell using built-in sorting function and at the end you just 
> concatenate the arrays by scanning them and picking smallest element as you 
> move forward. You turn 250x8 => 500x4 => 1000x2 => 2000.
>
> It should take about 15-20 minutes to write, and the sorting would 
> probably take 5-6 minutes instead of half hour for 8 threads.
>
> In the concat phase you can also paralellize the process more, by getting 
> the average of middle elements, then finding position where element is less 
> than this value and just breaking the array, at this point into 2 sepatate 
> arrays. Eg. if you're doing the last step and you see the element in Arr1 
> at position 500000 is 91821 and at the same position in Arr2 you have 
> 1782713 => average is 937267 you're using binary search to find position of 
> 937267 or anything that is closest in both arrays - then you can break the 
> arrays to Arr11 +Arr12 / Arr21 + Arr22 and you just concat Arr11 c Arr21 
> and Arr21 c Arr22. But that probably would take more time and is not 
> necessairly worth the effor.
>
>
>
> W dniu środa, 29 listopada 2017 15:19:13 UTC+1 użytkownik Subramanian K 
> napisał:
>>
>> Hi
>>
>> I am using native sort provided part of the package and to process 48MB 
>> of slice data, it takes ~45sec.
>> To run 2GB of data it takes really long time, I am trying to split these 
>> to buckets and make it run concurrently, finally need to collate results of 
>> all these small sorted buckets.
>>
>> Do we have any sort package which can sort huge data swiftly?
>>
>> Regards,
>> Subu. K
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[go-nuts] Re: Sort a huge slice of data around 2GB

Reply via email to