[
https://issues.apache.org/jira/browse/HADOOP-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581264#action_12581264
]
Amar Kamat commented on HADOOP-2175:
------------------------------------
{quote}
am not clear why you have the check in JobInProgress for doing lostTaskTracker
outside the addTrackerTaskFailure.
{quote}
+1
bq. Also, inside lostTaskTracker you check for whether the task was already
FAILED/KILLED
I did that because the TIP failed/killed before the TT got lost, should be kept
failed/kiiled. There is no need to reschedule or change their status. Since the
task was not killed because of lost TT, I ignored it.
bq. On the change to MiniMRCluster ....
I think there is a problem with MiniMRCluster w.r.t lost TTs. It keeps on
trying for the TT to be idle and eventually the test times out. I am still
trying to find out why the MiniMR gets stuck.
bq. On the TestLostBlackListedTracker, i don't think you need to make it that
complicated
+1.
{quote}
The way you get events is also not very reliable w.r.t timing. In the first
call to getTaskCompletionEvents, you might get events.length = 0
{quote}
I use launchJob which waits for the job to complete. Its a blocking call.
> Blacklisted hosts may not be able to serve map outputs
> ------------------------------------------------------
>
> Key: HADOOP-2175
> URL: https://issues.apache.org/jira/browse/HADOOP-2175
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Reporter: Runping Qi
> Assignee: Amar Kamat
> Attachments: HADOOP-2175-v1.1.patch, HADOOP-2175-v1.patch
>
>
> After a node fails 4 mappers (tasks), it is added to blacklist thus it will
> no longer accept tasks.
> But, it will continue serve the map outputs of any mappers that ran
> successfully there.
> However, the node may not be able serve the map outputs either.
> This will cause the reducers to mark the corresponding map outputs as from
> slow hosts,
> but continue to try to get the map outputs from that node.
> This may lead to waiting forever.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.