[
https://issues.apache.org/jira/browse/HADOOP-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541828
]
Runping Qi commented on HADOOP-1984:
------------------------------------
It will be really helpful if we know the overall job progress status,
and health state of the node hosting the map output.
If there are still a lot of pending mappers (and thus,
each reducer gets many map outputs copied in each pass in shuffling),
it is OK to wait for a long time for some map outputs
However, if the concerned map outputs are the only ones that the reducer
is trying to get, then it makes more sense to re-execute the mappers
than waiting for 10 minutes.
Also, if we know the node holding the concernec map outputs has some other
tasks
and those tasks are making healthy progress, it should be OK for waiting for a
little long.
However, if the node does not have any tasks running on it, or the tasks on the
node
do not make progress, it is better to start re-execute earlier than waiting for
a long period.
In particular, in the case I ran into, the concerned node was KNOWN to be in
bad shape,
and was BLACKLISTED, thus, it did not have any tasks running on it. In this
case,
it is clear that it is much better to re-execute the mappers earlier than
waiting for the map
outputs from that node.
> some reducer stuck at copy phase and progress extremely slowly
> --------------------------------------------------------------
>
> Key: HADOOP-1984
> URL: https://issues.apache.org/jira/browse/HADOOP-1984
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.16.0
> Reporter: Runping Qi
> Assignee: Amar Kamat
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1984.patch
>
>
> In many cases, some reducers got stuck at copy phase, progressing extremely
> slowly.
> The entire cluster seems doing nothing. This causes a very bad long tails of
> otherwise well tuned map/red jobs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.