[ https://issues.apache.org/jira/browse/HADOOP-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ravi Gummadi updated HADOOP-5572: --------------------------------- Attachment: HADOOP-5572.v1.2.patch Made code changes as per Jothi's 1st 4 comments. # Check if progress is being updated correctly and works fine with new Reducer API As progress is not updated with new Reducer api while records are being fed to reducer, reduce task progress is not updated from 66.66% and jumps to 100% when the task is done. May be we need to file a separate JIRA for the new API to have the updation of progress similar to old api. # Merger: Remove Collections.sort() in the beginning OK. Removed sort() in the begining of merge() and changed the code in the callers to get sorted segments to merge() if there are more than ioSortFactor segments. Changed mergeParts() to call merge() with sorted segments if there are more than ioSortFactor segments. Earlier, mergeParts() was sending unsorted segments to merge() and after first intermediate merge only, segments are sorted --- so 1st merge is not merging the smallest segments. Removed sort() call after each intermediate merge and 'insertion into sorted segments list' is done. This could improve performance as calling sort with complexity O(n.logn) after each intermediate merge is costly. # Can we do better than relying on writesCounter to determine if the final merge needs to be included in the calculation or not? I couldn't see a cleaner/better way of doing this. Attaching patch with the above changes. Please review and provide your comments. > The map progress value should have a separate phase for doing the final sort. > ----------------------------------------------------------------------------- > > Key: HADOOP-5572 > URL: https://issues.apache.org/jira/browse/HADOOP-5572 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Reporter: Owen O'Malley > Assignee: Ravi Gummadi > Attachments: HADOOP-5572.patch, HADOOP-5572.v1.1.patch, > HADOOP-5572.v1.2.patch, HADOOP-5572.v1.patch > > > Currently, the final spill and sort doesn't record any progress while it > runs, leading to the perception that the map is done, but "stuck". -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.