[ 
https://issues.apache.org/jira/browse/HADOOP-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662240#action_12662240
 ] 

Amareshwari Sriramadasu commented on HADOOP-3327:
-------------------------------------------------

After discussion with Jothi and Devaraj, I propose the following approach :

1. If pendingCopies < 0.25 * numMaps, // towards the end of shuffle
    fetchRetries = maxFetchRetriesPerMap/2;
    // this will send first notification to JT in half the time of the existing 
algorithm.
    // Also exponential back-off is half the number of times.

2. If failure is because of ReadTimeOut, 
     send notification to JT immediately.
     towards the end of shuffle, back off for min (maxMapRunTime/2, current 
backoff);
     else  back off for maxMapRunTime/2.


3. At JT,
    if freeMapSlots  < 0.5 * totalMapSlots,  re-execute the map after 3 
notifications. (current algorithm)
    else re-execute the map after 2 notifications.


Thoughts?

> Shuffling fetchers waited too long between map output fetch re-tries
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3327
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3327
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Amareshwari Sriramadasu
>         Attachments: hadoop-3327-v1.patch, hadoop-3327-v2.patch, 
> hadoop-3327-v3.patch, hadoop-3327.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to