[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100763#comment-13100763
 ] 

Eli Collins commented on MAPREDUCE-2960:
----------------------------------------

Just fixing the setTimes issue doesn't fix the total issue. TaskRunner#run 
keeps failing to create the log dirs because it hasn't yet removed the failing 
dir from good dirs. I'm not sure why the job submission hangs, it should try to 
run the task on one of the other TTs (there are two others running in my case).

{noformat}
java.lang.Throwable: Child Error
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Creation of 
/faulty-disk/dir1/userlogs/job_201109081514_0002/attempt_201109081514_0002_m_000000_1
 failed.
        at 
org.apache.hadoop.mapred.TaskLog.createTaskAttemptLogDir(TaskLog.java:102)
        at 
org.apache.hadoop.mapred.DefaultTaskController.createLogDir(DefaultTaskController.java:71)
        at 
org.apache.hadoop.mapred.TaskRunner.prepareLogFiles(TaskRunner.java:316)
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:228)

-------
java.lang.Throwable: Child Error
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of -1.
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
{noformat}


> A single TT disk failure can cause the job to fail
> --------------------------------------------------
>
>                 Key: MAPREDUCE-2960
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2960
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: tasktracker
>    Affects Versions: 0.20.204.0
>            Reporter: Eli Collins
>             Fix For: 0.20.205.0
>
>
> TaskInProgress#kill in the JT fails because TaskStatus#setFinishTimes fails 
> because no start time was set. There's no start time because TaskTracker#run 
> (DefaultTaskController#initializeJob) failed before it was set. The fix is to 
> have TT#launchTask set the start time before it starts the task runner, this 
> way there's a valid start time even if TT#run fails.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to