[ http://issues.apache.org/jira/browse/HADOOP-287?page=comments#action_12423847 ] Benjamin Reed commented on HADOOP-287: --------------------------------------
I have improved it a bit more. It now is guaranteed to only take logN stack space, and I eaked out a bit more performance. Unfortunately, the 30% improvement is for the in-memory sort. For your slow disks you need my other patch which reduces the number of times the data hits the disk. Unfortunately, that patch doesn't apply anymore. I have a newer version that removes one more full disk hit, so that should work even better. I'll try to create a patch for it today. (Unrelated changes break my patches and take a long time to reconcile... > Speed up SequenceFile sort with memory reduction > ------------------------------------------------ > > Key: HADOOP-287 > URL: http://issues.apache.org/jira/browse/HADOOP-287 > Project: Hadoop > Issue Type: Improvement > Components: io > Affects Versions: 0.3.2 > Reporter: Benjamin Reed > Assigned To: Doug Cutting > Attachments: s.patch, zoom-sort.patch, zoom-sort.patch > > > I replaced the merge sort with a quick sort and it yielded approx 30% > improvement in sort time. It also reduced the memory requirement for sorting > because the sort is done in place. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
