[ 
https://issues.apache.org/jira/browse/HADOOP-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601680#action_12601680
 ] 

Runping Qi commented on HADOOP-3478:
------------------------------------


bq >In order to protect against early or too aggressive killing, we should 
probably maintain the strategy of waiting for notifications from multiple 
>reducers for all maps. Since the map failure notifications are sent only after 
a certain number of retries, we should be okay in protecting the >maps against 
temporary network glitches

We should differentiate between the progress stage of the job. 

If there are a lot of unfinished mappers, then we should not do aggressive 
mapper re-executions.

If reducers have a lot of un-fetched map outputs, they can wait for a longer 
period time before re-fetch the 
map outputs that failed to fetcher previously. However, if one or more reducers 
are waiting for one or a few map-outputs, 
then the reducers should re-try aggressively, and if fail persists, the mappers 
should be re-executed aggressively.



> The algorithm to decide map re-execution on fetch failures can be improved
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-3478
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3478
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Jothi Padmanabhan
>
> The algorithm to decide map re-execution on fetch failures can be improved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to