[
https://issues.apache.org/jira/browse/HADOOP-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582301#action_12582301
]
Amar Kamat commented on HADOOP-2175:
------------------------------------
After some offline discussions with some folks here this is what seems
reasonable: kill the map on a per map basis and tweak the logic of killing maps
due to "too many fetch failures" that currently depends on notifications from
all running reducers, to just *one notification* if the tracker in question has
been blacklisted. That way we will not be too aggressive (we don't kill too
many maps in one go) and we will be harsh with the map corresponding to the
failed fetch.. Thoughts?
> Blacklisted hosts may not be able to serve map outputs
> ------------------------------------------------------
>
> Key: HADOOP-2175
> URL: https://issues.apache.org/jira/browse/HADOOP-2175
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Reporter: Runping Qi
> Assignee: Amar Kamat
> Fix For: 0.17.0
>
> Attachments: HADOOP-2175-v1.1.patch, HADOOP-2175-v1.patch,
> HADOOP-2175-v2.patch, HADOOP-2175-v2.patch
>
>
> After a node fails 4 mappers (tasks), it is added to blacklist thus it will
> no longer accept tasks.
> But, it will continue serve the map outputs of any mappers that ran
> successfully there.
> However, the node may not be able serve the map outputs either.
> This will cause the reducers to mark the corresponding map outputs as from
> slow hosts,
> but continue to try to get the map outputs from that node.
> This may lead to waiting forever.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.