[
https://issues.apache.org/jira/browse/MAPREDUCE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100763#comment-13100763
]
Eli Collins commented on MAPREDUCE-2960:
----------------------------------------
Just fixing the setTimes issue doesn't fix the total issue. TaskRunner#run
keeps failing to create the log dirs because it hasn't yet removed the failing
dir from good dirs. I'm not sure why the job submission hangs, it should try to
run the task on one of the other TTs (there are two others running in my case).
{noformat}
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Creation of
/faulty-disk/dir1/userlogs/job_201109081514_0002/attempt_201109081514_0002_m_000000_1
failed.
at
org.apache.hadoop.mapred.TaskLog.createTaskAttemptLogDir(TaskLog.java:102)
at
org.apache.hadoop.mapred.DefaultTaskController.createLogDir(DefaultTaskController.java:71)
at
org.apache.hadoop.mapred.TaskRunner.prepareLogFiles(TaskRunner.java:316)
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:228)
-------
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of -1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
{noformat}
> A single TT disk failure can cause the job to fail
> --------------------------------------------------
>
> Key: MAPREDUCE-2960
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2960
> Project: Hadoop Map/Reduce
> Issue Type: Sub-task
> Components: tasktracker
> Affects Versions: 0.20.204.0
> Reporter: Eli Collins
> Fix For: 0.20.205.0
>
>
> TaskInProgress#kill in the JT fails because TaskStatus#setFinishTimes fails
> because no start time was set. There's no start time because TaskTracker#run
> (DefaultTaskController#initializeJob) failed before it was set. The fix is to
> have TT#launchTask set the start time before it starts the task runner, this
> way there's a valid start time even if TT#run fails.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira