[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049336#comment-13049336
 ] 

Esteban Gutierrez commented on MAPREDUCE-2592:
----------------------------------------------

The problem propagates very quickly to all the nodes after a single TaskTracker 
has reached that state and more jobs are submitted. This problem can bring down 
the whole cluster since all the TT will be blacklisted.

A sample stacktrace:

11/02/05 10:00:01 WARN mapred.JobClient: Error reading task 
outputhttp://dn:50060/tasklog?plaintext=true&taskid=attempt_201102050901_1000_m_000001_0&filter=stderr
 
11/02/05 10:00:02 INFO mapred.JobClient: Task Id : 
attempt_201102050901_1000_m_000001_0, Status : FAILED 
java.lang.Throwable: Child Error 
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:471) 
Caused by: java.io.IOException: Task process exit with nonzero status of 1. 
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:458)



> TT should fail task immediately if userlog dir cannot be created
> ----------------------------------------------------------------
>
>                 Key: MAPREDUCE-2592
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2592
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.23.0
>            Reporter: Todd Lipcon
>             Fix For: 0.23.0
>
>
> Currently, TaskRunner will log the message "mkdirs failed. Ignoring" if it 
> fails to mkdir the userlog directory for a task. Then, it goes on to spawn 
> taskjvm.sh which tries to redirect output into the userlogs dir, thus failing 
> with exit code 1. This leads to error messages that are very hard to diagnose 
> ("task failed with exit status 1") in cases where the userlog directory has 
> either become inaccessible or has reached the maximum number of dirents 
> (32000 in ext3)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to