Hi everyone,
I have a job running that keeps failing with Stack Overflows and I really
dont see how that is happening.
The job runs for about 20-30 minutes before one task errors, then a few more
error and it fails.
I am running hadoop-17 and ive tried lowering these settings to no avail:
io.sort.factor 50
io.seqfile.sorter.recordlimit 500000
java.io.IOException: Spill failed
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:594)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:576)
at java.io.DataOutputStream.writeInt(DataOutputStream.java:180)
at Group.write(Group.java:68)
at GroupPair.write(GroupPair.java:67)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:434)
at MyMapper.map(MyMapper.java:27)
at MyMapper.map(MyMapper.java:10)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
Caused by: java.lang.StackOverflowError
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at Group.readFields(Group.java:62)
at GroupPair.readFields(GroupPair.java:60)
at
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:91)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:494)
at org.apache.hadoop.util.QuickSort.fix(QuickSort.java:29)
at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:58)
at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:58)
....the above line repeated 200x
I defined writeablecomparable called GroupPair which simply holds to Group
objects, each of which contains two integers. I fail to see how QuickSort
could recurse 200+ times since that would require an insanely large amount
of entries , far more then the 500 million that had been output at that
point.
How is this even possible? And what can be done to fix this?
--
View this message in context:
http://www.nabble.com/Stack-Overflow-When-Running-Job-tp17593594p17593594.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.