[
https://issues.apache.org/jira/browse/MAPREDUCE-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049336#comment-13049336
]
Esteban Gutierrez commented on MAPREDUCE-2592:
----------------------------------------------
The problem propagates very quickly to all the nodes after a single TaskTracker
has reached that state and more jobs are submitted. This problem can bring down
the whole cluster since all the TT will be blacklisted.
A sample stacktrace:
11/02/05 10:00:01 WARN mapred.JobClient: Error reading task
outputhttp://dn:50060/tasklog?plaintext=true&taskid=attempt_201102050901_1000_m_000001_0&filter=stderr
11/02/05 10:00:02 INFO mapred.JobClient: Task Id :
attempt_201102050901_1000_m_000001_0, Status : FAILED
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:471)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:458)
> TT should fail task immediately if userlog dir cannot be created
> ----------------------------------------------------------------
>
> Key: MAPREDUCE-2592
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2592
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: tasktracker
> Affects Versions: 0.23.0
> Reporter: Todd Lipcon
> Fix For: 0.23.0
>
>
> Currently, TaskRunner will log the message "mkdirs failed. Ignoring" if it
> fails to mkdir the userlog directory for a task. Then, it goes on to spawn
> taskjvm.sh which tries to redirect output into the userlogs dir, thus failing
> with exit code 1. This leads to error messages that are very hard to diagnose
> ("task failed with exit status 1") in cases where the userlog directory has
> either become inaccessible or has reached the maximum number of dirents
> (32000 in ext3)
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira