[ 
https://issues.apache.org/jira/browse/HADOOP-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699265#action_12699265
 ] 

Ravi Gummadi commented on HADOOP-5572:
--------------------------------------

For example, if there are 6 segments(of lengths 10, 20, 30, 40, 50, 200) as 
input to merge() in map task, then totalBytesProcessed is incremented as and 
when the position in any segment is updated.

totalBytes =   (10+20)                     // 1st merge
                      + (30+30+40)            // 2nd merge
                      + (100+50+200)       //  3rd merge
                  = 480   // denominator in computation of mergeProgress during 
1st merge

After 1st merge, mergeProgress = totalBytesProcessed/totalBytes = (10+20)/480;
Let us say the length of the merged segment(of 1st merge with i/p sizes 10, 20) 
is 25 because of combiner.
totalBytes = 480 - (30-25) = 475; // denominator in computation of 
mergeProgress during 2nd merge

After 2nd merge, mergeProgress = (30+(25+30+40))/475;
Let us say the length of the merged segment(of 2nd merge with i/p sizes 25, 30, 
40) is 85 because of combiner.
totalBytes = 475 - (25+30+40 - 85) = 465; // denominator in computation of 
mergeProgress during 3rd merge

After 3rd merge, mergeProgress = (30+85+(100+50+200))/465=1.0;

> The map progress value should have a separate phase for doing the final sort.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5572
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5572
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Ravi Gummadi
>
> Currently, the final spill and sort doesn't record any progress while it 
> runs, leading to the perception that the map is done, but "stuck".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to