hadoop streaming tasks hang for when stream.non.zero.exit.is.failure==true and
reduce processes exit with non zero status
-------------------------------------------------------------------------------------------------------------------------
Key: HADOOP-3068
URL: https://issues.apache.org/jira/browse/HADOOP-3068
Project: Hadoop Core
Issue Type: Bug
Components: contrib/streaming
Environment: Java(TM) SE Runtime Environment (build 1.6.0_04-b12);
Hadoop version 0.17.0-dev, r639662
Reporter: Yuri Pradkin
Fix For: 0.17.0
When I set *stream.non.zero.exit.is.failure* to true and run a streaming job
with reducers that exit with a non-zero status, those tasks fail apparently
waiting for something.
...
2008-03-21 13:33:53,715 INFO org.apache.hadoop.streaming.PipeMapRed:
R/W/S=65501/1/0 in:334=65501/196 [rec/s] out:0=1/196 [rec/s]
2008-03-21 13:33:53,719 INFO org.apache.hadoop.streaming.PipeMapRed:
mapRedFinished
2008-03-21 13:34:11,228 INFO org.apache.hadoop.streaming.PipeMapRed: Records
R/W=65536/2
2008-03-21 13:34:11,235 INFO org.apache.hadoop.streaming.PipeMapRed:
PipeMapRed.waitOutputThreads(): subprocess exitted with code 1
2008-03-21 13:34:11,235 INFO org.apache.hadoop.streaming.PipeMapRed:
MRErrorThread done
2008-03-21 13:34:11,238 INFO org.apache.hadoop.streaming.PipeMapRed:
MROutputThread done
2008-03-21 13:34:11,245 WARN org.apache.hadoop.mapred.TaskTracker: Error
running child
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed
with code 1
at
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:331)
at
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:475)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:110)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2113)
After that the task still shows up with status:Running, but it just hangs there
and when/if all tasks get into this state, the whole cluster hangs.
BTW, may I suggest that we make *stream.non.zero.exit.is.failure* default to
true after this is fixed?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.