It should be very simple if you have additional 2G of memory. You divide the data to X parts where X is power of 2 and X needs to be less than number of cores available. Eg. for 2000MB it can be 250x8. Then you sort it in paralell using built-in sorting function and at the end you just concatenate the arrays by scanning them and picking smallest element as you move forward. You turn 250x8 => 500x4 => 1000x2 => 2000.
It should take about 15-20 minutes to write, and the sorting would probably take 5-6 minutes instead of half hour for 8 threads. In the concat phase you can also paralellize the process more, by getting the average of middle elements, then finding position where element is less than this value and just breaking the array, at this point into 2 sepatate arrays. Eg. if you're doing the last step and you see the element in Arr1 at position 500000 is 91821 and at the same position in Arr2 you have 1782713 => average is 937267 you're using binary search to find position of 937267 or anything that is closest in both arrays - then you can break the arrays to Arr11 +Arr12 / Arr21 + Arr22 and you just concat Arr11 c Arr21 and Arr21 c Arr22. But that probably would take more time and is not necessairly worth the effor. W dniu środa, 29 listopada 2017 15:19:13 UTC+1 użytkownik Subramanian K napisał: > > Hi > > I am using native sort provided part of the package and to process 48MB of > slice data, it takes ~45sec. > To run 2GB of data it takes really long time, I am trying to split these > to buckets and make it run concurrently, finally need to collate results of > all these small sorted buckets. > > Do we have any sort package which can sort huge data swiftly? > > Regards, > Subu. K > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.