[ 
https://issues.apache.org/jira/browse/HADOOP-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716812#action_12716812
 ] 

Aaron Kimball commented on HADOOP-5985:
---------------------------------------

Sure.

To clarify a bit first: when reducers put mappers in "the penalty box," they're 
saying that this mapper is being slow, so it should prioritize receiving from 
other mappers instead. But if there are no other mappers left, then this just 
makes the reducer spin in circles. If the only mappers left are in the penalty 
box, then something's probably wrong.

Call the home stretch percentage 4%. There are 1,000 map tasks.  There are 20 
reduce tasks. The kill quorum is 10%.

A reducer retrieves 960 map task outputs -- that's 96%, so we're in the last 
4%. There are 40 mapper outputs left to receive. The 20 mapper tasks from one 
node are all in the penalty box. The other twenty mapper outputs get retrieved. 
The 20 stalled outputs are still in the penalty box. The timeout is triggered, 
meaning that those 20 tasks just spun in the penalty box for way too long. The 
reducer votes to tell the JobTracker that the mapper is frozen.

Another reducer experiences the same problem. It votes the same way.

The JobTracker has now received 2/20 = 10% of the active shufflers saying that 
a particular mapper node has not been delivering its outputs. The JobTracker 
blacklists that TT and marks all its mapper tasks as failed. All twenty of 
those map tasks go to other nodes. They finish. The reducers grab their 
outputs, and the reduce process continues.


> A single slow (but not dead) map TaskTracker impedes MapReduce progress
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-5985
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5985
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.3
>            Reporter: Aaron Kimball
>
> We see cases where there may be a large number of mapper nodes running many 
> tasks (e.g., a thousand). The reducers will pull 980 of the map task 
> intermediate files down, but will be unable to retrieve the final 
> intermediate shards from the last node. The TaskTracker on that node returns 
> data to reducers either slowly or not at all, but its heartbeat messages make 
> it back to the JobTracker -- so the JobTracker doesn't mark the tasks as 
> failed. Manually stopping the offending TaskTracker works to migrate the 
> tasks to other nodes, where the shuffling process finishes very quickly. Left 
> on its own, it can take hours to unjam itself otherwise.
> We need a mechanism for reducers to provide feedback to the JobTracker that 
> one of the mapper nodes should be regarded as lost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to