[jira] Commented: (HADOOP-3333) job failing because of reassigning same tasktracker to failing tasks

Jothi Padmanabhan (JIRA) Fri, 13 Jun 2008 04:23:08 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12604804#action_12604804
 ]


Jothi Padmanabhan commented on HADOOP-3333:
-------------------------------------------

It appears that the only way to handle the case of multiple TTs per node is to 
trickle down the list of uniques hosts (that run TT) down from the JobTracker 
(hostsReader.getHosts().size()). This information needs to trickle down through 
several layers before it can be used in findTaskFromList. We need to evaluate 
if we want to make this change to handle this special case or just go ahead 
with the existing patch and document this case as a limitation.

> job failing because of reassigning same tasktracker to failing tasks
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3333
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3333
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.3
>            Reporter: Christian Kunz
>            Assignee: Jothi Padmanabhan
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: HADOOP-3333_0_20080503.patch, 
> HADOOP-3333_1_20080505.patch, HADOOP-3333_2_20080506.patch
>
>
> We have a long running a job in a 2nd atttempt. Previous job was failing and 
> current jobs risks to fail as well, because  reduce tasks failing on marginal 
> TaskTrackers are assigned repeatedly to the same TaskTrackers (probably 
> because it is the only available slot), eventually running out of attempts.
> Reduce tasks should be assigned to the same TaskTrackers at most twice, or 
> TaskTrackers need to get some better smarts to find  failing hardware.
> BTW, mapred.reduce.max.attempts=12, which is high, but does not help in this 
> case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3333) job failing because of reassigning same tasktracker to failing tasks

Reply via email to