[ 
https://issues.apache.org/jira/browse/HADOOP-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583568#action_12583568
 ] 

Amar Kamat commented on HADOOP-3130:
------------------------------------

It seems that the log info is the main cause of confusion. This is what we 
think has happened as per the logs
1) The reducer gets the task completion event for a bunch of maps and schedules 
them.
2) All the map outputs get successfully copied except one.
3) Assume that the jetty that was supposed to serve the remaining map's output 
is busy.
4) After 3 mins the attempt fails, gets retried and succeeds. 3min is the 
timeout for a fetch attempt.
This also explains the 2 min wait mentioned above. In the first 1 min other map 
outputs are fetched (i.e overlapped). In the remaining 2 mins (before timeout) 
the reducer is just waiting for the last map's output. The '*need 1 map 
output*' info in the reducers logs should also mention how many of them are in 
progress.

> Shuffling takes too long to get the last map output.
> ----------------------------------------------------
>
>                 Key: HADOOP-3130
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3130
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Runping Qi
>         Attachments: shuffling.log
>
>
> I noticed that towards the end of shufflling, the map output fetcher of the 
> reducer backs off too aggressively.
> I attach a fraction of one reduce log of my job.
> Noticed that the last map output was not fetched in 2 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to