[ 
https://issues.apache.org/jira/browse/HADOOP-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated HADOOP-5572:
---------------------------------

    Attachment: HADOOP-5572.v1.2.patch

Made code changes as per Jothi's 1st 4 comments.

# Check if progress is being updated correctly and works fine with new Reducer 
API

As progress is not updated with new Reducer api while records are being fed to 
reducer, reduce task progress is not updated from 66.66% and jumps to 100% when 
the task is done. May be we need to file a separate JIRA for the new API to 
have the updation of progress  similar to old api.

# Merger: Remove Collections.sort() in the beginning

OK. Removed sort() in the begining of merge() and changed the code in the 
callers to get sorted segments to merge() if there are more than ioSortFactor 
segments.
Changed mergeParts() to call merge() with sorted segments if there are more 
than ioSortFactor segments. Earlier, mergeParts() was sending unsorted segments 
to merge() and after first intermediate merge only, segments are sorted --- so 
1st merge is not merging the smallest segments.
Removed sort() call after each intermediate merge and 'insertion into sorted 
segments list' is done. This could improve performance as calling sort with 
complexity O(n.logn) after each intermediate merge is costly.

# Can we do better than relying on writesCounter to determine if the final 
merge needs to be included in the calculation or not?

I couldn't see a cleaner/better way of doing this.

Attaching patch with the above changes. Please review and provide your comments.

> The map progress value should have a separate phase for doing the final sort.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5572
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5572
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Ravi Gummadi
>         Attachments: HADOOP-5572.patch, HADOOP-5572.v1.1.patch, 
> HADOOP-5572.v1.2.patch, HADOOP-5572.v1.patch
>
>
> Currently, the final spill and sort doesn't record any progress while it 
> runs, leading to the perception that the map is done, but "stuck".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to