On Thu, 05 Jul 2007 11:10:31 PDT, John Heidemann wrote: > >I'm running hadoop streaming from svn (version 552930, reasonably >recent). My map/reduce job maps ~1M records, but then a few reduces >succeed and many fail, eventually terminating the job unsuccessfully. >I'm looking for some debugging hints. > > >The failures are all "broken pipe": > >java.io.IOException: R/W/S=1/0/0 in:0=1/342 [rec/s] out:0=0/342 [rec/s] >minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null >HOST=null >USER=hadoop >HADOOP_USER=null >last Hadoop input: |null| >last tool output: |null| >Date: Thu Jul 05 10:46:28 PDT 2007 >Broken pipe > at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:91) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:323) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1763) > >... > > >The other strange thing is I don't get 100% reduce failures, but maybe >490/503 fail. >
I found the problem... yes, "pipe broken" is the error message you get when you your reduce task fails to start (in my case, due to a missing library). A bug in my code is clearly my fault. But I might suggest it would be nice for hadoop streaming to check the exit code of the map and reduce tasks, to provide, say, a more informative error message. (I'll see if I can fix this.) The other odd thing is the ~10 reduces that succeed apparently succeed because they were reducers with no input, which apparently hadoop streaming silently turns into successful no output. -John
