[
https://issues.apache.org/jira/browse/HADOOP-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581260#action_12581260
]
Devaraj Das commented on HADOOP-2175:
-------------------------------------
I am not clear why you have the check in JobInProgress for doing
lostTaskTracker outside the addTrackerTaskFailure. You could do the check
inside the method, right?
Also, inside lostTaskTracker you check for whether the task was already
FAILED/KILLED. Do you need to do the check for KILLED?
On the change to MiniMRCluster, I am not convinced that this is the right thing
to do (wait for 10 seconds and then giving up).
On the TestLostBlackListedTracker, i don't think you need to make it that
complicated. A simple dummy split based map should work. In that case you don't
have to change TestRackAwareTaskPlacement. The way you get events is also not
very reliable w.r.t timing. In the first call to getTaskCompletionEvents, you
might get events.length = 0. Isn't this a problem. I'd say that you wait for
the job to complete and then get the events.
> Blacklisted hosts may not be able to serve map outputs
> ------------------------------------------------------
>
> Key: HADOOP-2175
> URL: https://issues.apache.org/jira/browse/HADOOP-2175
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Reporter: Runping Qi
> Assignee: Amar Kamat
> Attachments: HADOOP-2175-v1.1.patch, HADOOP-2175-v1.patch
>
>
> After a node fails 4 mappers (tasks), it is added to blacklist thus it will
> no longer accept tasks.
> But, it will continue serve the map outputs of any mappers that ran
> successfully there.
> However, the node may not be able serve the map outputs either.
> This will cause the reducers to mark the corresponding map outputs as from
> slow hosts,
> but continue to try to get the map outputs from that node.
> This may lead to waiting forever.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.