[jira] [Commented] (MAPREDUCE-2960) A single TT disk failure can cause the job to fail

Eli Collins (Commented) (JIRA) Tue, 29 Nov 2011 14:52:08 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159604#comment-13159604
 ]


Eli Collins commented on MAPREDUCE-2960:
----------------------------------------

You're right - the last comment is bogus (the JT was on a RO fs).

The earlier ones however are from just TTs running on loop-back mounts with 
faults injected, and the JT was fine. On the 1st it looks like the issue is 
that the JobClient doesn't handle errors getting task output, or when TT 
exceptions get plumbed back up to it. Though perhaps per MAPREDUCE-3473 this is 
expected behavior given that *.failures.maxpercent defaults to 0.
                
> A single TT disk failure can cause the job to fail
> --------------------------------------------------
>
>                 Key: MAPREDUCE-2960
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2960
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: tasktracker
>    Affects Versions: 0.20.204.0
>            Reporter: Eli Collins
>
> TaskInProgress#kill in the JT fails because TaskStatus#setFinishTimes fails 
> because no start time was set. There's no start time because TaskTracker#run 
> (DefaultTaskController#initializeJob) failed before it was set. The fix is to 
> have TT#launchTask set the start time before it starts the task runner, this 
> way there's a valid start time even if TT#run fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2960) A single TT disk failure can cause the job to fail

Reply via email to