[ 
https://issues.apache.org/jira/browse/HADOOP-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601670#action_12601670
 ] 

Devaraj Das commented on HADOOP-3478:
-------------------------------------

Yes, we should *randomize by the hosts*. But for a given host we should sort it 
by the mapIDs to detect faults early enough (the comments above). The 
knownOutputs stucture today is a list. That might be done away with and instead 
a map from locations to MapIDs could be maintained (whenever we get a map 
completion event, we know the location anyway).
In order to protect against early or too aggressive killing, we should probably 
maintain the strategy of waiting for notifications from multiple reducers for 
all maps. Since the map failure notifications are sent only after a certain 
number of retries, we should be okay in protecting the maps against temporary 
network glitches.

> The algorithm to decide map re-execution on fetch failures can be improved
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-3478
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3478
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Jothi Padmanabhan
>
> The algorithm to decide map re-execution on fetch failures can be improved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to