Thanks all, did some metrics on comparison function "Less", had JSON data in this huge file, had to unmarshall JSON data and then do a comparison based on obtained data. unmarshalling took 98% of time in this comparison. Looking to avoid unmarshaliing and finding an alternative way to do this.
Thanks all for your time and suggestions. Regards, Subramanian. K On Thursday, 30 November 2017 19:53:01 UTC+5:30, Slawomir Pryczek wrote: > > It should be very simple if you have additional 2G of memory. You divide > the data to X parts where X is power of 2 and X needs to be less than > number of cores available. Eg. for 2000MB it can be 250x8. Then you sort it > in paralell using built-in sorting function and at the end you just > concatenate the arrays by scanning them and picking smallest element as you > move forward. You turn 250x8 => 500x4 => 1000x2 => 2000. > > It should take about 15-20 minutes to write, and the sorting would > probably take 5-6 minutes instead of half hour for 8 threads. > > In the concat phase you can also paralellize the process more, by getting > the average of middle elements, then finding position where element is less > than this value and just breaking the array, at this point into 2 sepatate > arrays. Eg. if you're doing the last step and you see the element in Arr1 > at position 500000 is 91821 and at the same position in Arr2 you have > 1782713 => average is 937267 you're using binary search to find position of > 937267 or anything that is closest in both arrays - then you can break the > arrays to Arr11 +Arr12 / Arr21 + Arr22 and you just concat Arr11 c Arr21 > and Arr21 c Arr22. But that probably would take more time and is not > necessairly worth the effor. > > > > W dniu środa, 29 listopada 2017 15:19:13 UTC+1 użytkownik Subramanian K > napisał: >> >> Hi >> >> I am using native sort provided part of the package and to process 48MB >> of slice data, it takes ~45sec. >> To run 2GB of data it takes really long time, I am trying to split these >> to buckets and make it run concurrently, finally need to collate results of >> all these small sorted buckets. >> >> Do we have any sort package which can sort huge data swiftly? >> >> Regards, >> Subu. K >> > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.