[
https://issues.apache.org/jira/browse/HADOOP-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665705#action_12665705
]
Jothi Padmanabhan commented on HADOOP-3327:
-------------------------------------------
Looks good. A few points
* Some comments on the changes in the code would be good.
* The percentages that we use to decide maxNotifications and fetchRetriesPerMap
should be configurable?
* Since fetchRetriesPerMap is computed during every iteration as per the
current copiedMapOutputs.size, it is possible that we might delay a
notification to the JT by one failure. For example, consider
maxFetchRetriesPerMap = 5 and numRetries=4. During the next failure numRetries
= 5, and let us say we cross the threshold and reset fetchRetriesperMap = 2
(5/2). As per the existing logic, we would have sent a notification as
numRetires = maxFetchRetriesPerMap. But with the new logic, we will wait as 5%2
!= 0. But this is a corner case and probably can be overlooked.
> Shuffling fetchers waited too long between map output fetch re-tries
> --------------------------------------------------------------------
>
> Key: HADOOP-3327
> URL: https://issues.apache.org/jira/browse/HADOOP-3327
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Reporter: Runping Qi
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.21.0
>
> Attachments: hadoop-3327-v1.patch, hadoop-3327-v2.patch,
> hadoop-3327-v3.patch, hadoop-3327.patch, patch-3327.txt
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.