[
https://issues.apache.org/jira/browse/HADOOP-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662240#action_12662240
]
Amareshwari Sriramadasu commented on HADOOP-3327:
-------------------------------------------------
After discussion with Jothi and Devaraj, I propose the following approach :
1. If pendingCopies < 0.25 * numMaps, // towards the end of shuffle
fetchRetries = maxFetchRetriesPerMap/2;
// this will send first notification to JT in half the time of the existing
algorithm.
// Also exponential back-off is half the number of times.
2. If failure is because of ReadTimeOut,
send notification to JT immediately.
towards the end of shuffle, back off for min (maxMapRunTime/2, current
backoff);
else back off for maxMapRunTime/2.
3. At JT,
if freeMapSlots < 0.5 * totalMapSlots, re-execute the map after 3
notifications. (current algorithm)
else re-execute the map after 2 notifications.
Thoughts?
> Shuffling fetchers waited too long between map output fetch re-tries
> --------------------------------------------------------------------
>
> Key: HADOOP-3327
> URL: https://issues.apache.org/jira/browse/HADOOP-3327
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Reporter: Runping Qi
> Assignee: Amareshwari Sriramadasu
> Attachments: hadoop-3327-v1.patch, hadoop-3327-v2.patch,
> hadoop-3327-v3.patch, hadoop-3327.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.