[
https://issues.apache.org/jira/browse/HADOOP-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601680#action_12601680
]
Runping Qi commented on HADOOP-3478:
------------------------------------
bq >In order to protect against early or too aggressive killing, we should
probably maintain the strategy of waiting for notifications from multiple
>reducers for all maps. Since the map failure notifications are sent only after
a certain number of retries, we should be okay in protecting the >maps against
temporary network glitches
We should differentiate between the progress stage of the job.
If there are a lot of unfinished mappers, then we should not do aggressive
mapper re-executions.
If reducers have a lot of un-fetched map outputs, they can wait for a longer
period time before re-fetch the
map outputs that failed to fetcher previously. However, if one or more reducers
are waiting for one or a few map-outputs,
then the reducers should re-try aggressively, and if fail persists, the mappers
should be re-executed aggressively.
> The algorithm to decide map re-execution on fetch failures can be improved
> --------------------------------------------------------------------------
>
> Key: HADOOP-3478
> URL: https://issues.apache.org/jira/browse/HADOOP-3478
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Jothi Padmanabhan
>
> The algorithm to decide map re-execution on fetch failures can be improved.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.