Hi everyone,
I have a job running that keeps failing with Stack Overflows and I really
dont see how that is happening.
The job runs for about 20-30 minutes before one task errors, then a few more
error and it fails.
I am running hadoop-17 and ive tried lowering these settings to no avail:
io.sort.factor  50
io.seqfile.sorter.recordlimit   500000

java.io.IOException: Spill failed
        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:594)
        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:576)
        at java.io.DataOutputStream.writeInt(DataOutputStream.java:180)
        at Group.write(Group.java:68)
        at GroupPair.write(GroupPair.java:67)
        at
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
        at
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:434)
        at MyMapper.map(MyMapper.java:27)
        at MyMapper.map(MyMapper.java:10)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
Caused by: java.lang.StackOverflowError
        at java.io.DataInputStream.readInt(DataInputStream.java:370)
        at Group.readFields(Group.java:62)
        at GroupPair.readFields(GroupPair.java:60)
        at
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:91)
        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:494)
        at org.apache.hadoop.util.QuickSort.fix(QuickSort.java:29)
        at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:58)
        at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:58)
....the above line repeated 200x

I defined writeablecomparable called GroupPair which simply holds to Group
objects, each of which contains two integers. I fail to see how QuickSort
could recurse 200+ times since that would require an insanely large amount
of entries , far more then the 500 million that had been output at that
point. 

How is this even possible? And what can be done to fix this?
-- 
View this message in context: 
http://www.nabble.com/Stack-Overflow-When-Running-Job-tp17593594p17593594.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Reply via email to