Hi, > I am running a mapreduce job on my hadoop cluster. > > I am running a 10 gigabytes data and one tiny failed task crashes the whole > operation. > I am up to 98% complete and throwing away all the finished data seems just > like an awful waste. > I'd like to save the finished data and run again only the failed ones(the > remaining 2%). > > Is there any way to figure out the range of the splits that failed? > I go to "localhost:50030" to see if I can find any useful information but I > must be looking at wrong places.
Can you check the 'Skip Bad records' feature mentioned here and see if that helps: http://hadoop.apache.org/common/docs/r0.20.1/mapred_tutorial.html#Skipping+Bad+Records ? Thanks Hemanth > > Could somebody help me with this problem? > > > Below is the log of a failed task. Any information I can use? > > *syslog logs* > > Records R/W=41707/41639 > 2010-06-30 07:35:30,530 INFO org.apache.hadoop.streaming.PipeMapRed: > Records R/W=41776/41726 > 2010-06-30 07:35:40,554 INFO org.apache.hadoop.streaming.PipeMapRed: > Records R/W=41865/41804 > 2010-06-30 07:35:50,559 INFO org.apache.hadoop.streaming.PipeMapRed: > Records R/W=41970/41932 > 2010-06-30 07:36:00,637 INFO org.apache.hadoop.streaming.PipeMapRed: > Records R/W=42073/42065 > 2010-06-30 07:36:10,772 INFO org.apache.hadoop.streaming.PipeMapRed: > Records R/W=42258/42196 > 2010-06-30 07:36:20,785 INFO org.apache.hadoop.streaming.PipeMapRed: > Records R/W=42318/42274 > 2010-06-30 07:36:30,985 INFO org.apache.hadoop.streaming.PipeMapRed: > Records R/W=42378/42351 > 2010-06-30 07:36:41,005 INFO org.apache.hadoop.streaming.PipeMapRed: > Records R/W=42442/42419 > 2010-06-30 07:36:51,149 INFO org.apache.hadoop.streaming.PipeMapRed: > Records R/W=42499/42484 > 2010-06-30 07:37:01,235 INFO org.apache.hadoop.streaming.PipeMapRed: > Records R/W=42559/42547 > 2010-06-30 07:37:11,242 INFO org.apache.hadoop.streaming.PipeMapRed: > Records R/W=42626/42611 > 2010-06-30 07:37:21,485 INFO org.apache.hadoop.streaming.PipeMapRed: > Records R/W=42769/42704 > 2010-06-30 07:37:31,617 INFO org.apache.hadoop.streaming.PipeMapRed: > Records R/W=42845/42782 > 2010-06-30 07:37:41,725 INFO org.apache.hadoop.streaming.PipeMapRed: > Records R/W=42915/42875 > 2010-06-30 07:37:51,733 INFO org.apache.hadoop.streaming.PipeMapRed: > Records R/W=42986/42949 > 2010-06-30 07:38:01,795 INFO org.apache.hadoop.streaming.PipeMapRed: > Records R/W=43070/43051 > 2010-06-30 07:38:11,849 INFO org.apache.hadoop.streaming.PipeMapRed: > Records R/W=43138/43136 > 2010-06-30 07:38:22,398 INFO org.apache.hadoop.streaming.PipeMapRed: > Records R/W=43258/43200 > 2010-06-30 07:38:31,642 INFO org.apache.hadoop.streaming.PipeMapRed: > MRErrorThread done > 2010-06-30 07:38:31,643 INFO org.apache.hadoop.streaming.PipeMapRed: > MROutputThread done > 2010-06-30 07:38:31,765 INFO org.apache.hadoop.streaming.PipeMapRed: log:null > R/W/S=43335/43271/0 in:7=43335/5885 [rec/s] out:7=43271/5885 [rec/s] > minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null > HOST=null > USER=hadoop > HADOOP_USER=null > last Hadoop input: |null| > last tool output: |[...@d22860| > Date: Wed Jun 30 07:38:31 KST 2010 > java.io.IOException: Broken pipe > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:260) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) > at java.io.DataOutputStream.write(DataOutputStream.java:90) > at org.apache.hadoop.streaming.PipeMapRed.write(PipeMapRed.java:635) > at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:105) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > > 2010-06-30 07:38:31,766 INFO org.apache.hadoop.streaming.PipeMapRed: > PipeMapRed failed! > 2010-06-30 07:38:31,766 INFO org.apache.hadoop.streaming.PipeMapRed: > PipeMapRed failed! > 2010-06-30 07:38:32,028 WARN org.apache.hadoop.mapred.TaskTracker: > Error running child > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess > failed with code 139 > at > org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311) > at > org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545) > at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) > at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > 2010-06-30 07:38:32,029 INFO org.apache.hadoop.mapred.TaskRunner: > Runnning cleanup for the task >