This is a known problem for 0.17.0: https://issues.apache.org/jira/browse/HADOOP-3442
It should be fixed in 0.17.1 Runping > -----Original Message----- > From: Colin Freas [mailto:[EMAIL PROTECTED] > Sent: Monday, June 09, 2008 12:56 PM > To: [email protected] > Subject: Re: Stack Overflow When Running Job > > We were getting this exact same problem in a really simple MR job, on > input > produced from a known-working MR job. > > It seemed to happen intermittently, and we couldn't figure out what was up. > In the end we solved the problem by increasing the number of maps (80 to > 200, this is a 6 node, 12 code cluster). Apparently, QuickSort can have > problems with big chunks of pre-sorted data. Too much recursion, I > believe. > > This might not be what's going on with you, maybe you're on a cluster of > some other scale, but this worked for us (and in a setup with Hadoop 0.17.) > > Good luck! > > -Colin > > On Mon, Jun 2, 2008 at 3:18 PM, Devaraj Das <[EMAIL PROTECTED]> wrote: > > > Hi, do you have a testcase that we can run to reproduce this? Thanks! > > > > > -----Original Message----- > > > From: jkupferman [mailto:[EMAIL PROTECTED] > > > Sent: Monday, June 02, 2008 9:22 AM > > > To: [email protected] > > > Subject: Stack Overflow When Running Job > > > > > > > > > Hi everyone, > > > I have a job running that keeps failing with Stack Overflows > > > and I really dont see how that is happening. > > > The job runs for about 20-30 minutes before one task errors, > > > then a few more error and it fails. > > > I am running hadoop-17 and ive tried lowering these settings > > > to no avail: > > > io.sort.factor 50 > > > io.seqfile.sorter.recordlimit 500000 > > > > > > java.io.IOException: Spill failed > > > at > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write( > > > MapTask.java:594) > > > at > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write( > > > MapTask.java:576) > > > at java.io.DataOutputStream.writeInt(DataOutputStream.java:180) > > > at Group.write(Group.java:68) > > > at GroupPair.write(GroupPair.java:67) > > > at > > > org.apache.hadoop.io.serializer.WritableSerialization$Writable > > Serializer.serialize(WritableSerialization.java:90) > > > at > > > org.apache.hadoop.io.serializer.WritableSerialization$Writable > > Serializer.serialize(WritableSerialization.java:77) > > > at > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTa > > > sk.java:434) > > > at MyMapper.map(MyMapper.java:27) > > > at MyMapper.map(MyMapper.java:10) > > > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) > > > at > > > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124) > > > Caused by: java.lang.StackOverflowError > > > at java.io.DataInputStream.readInt(DataInputStream.java:370) > > > at Group.readFields(Group.java:62) > > > at GroupPair.readFields(GroupPair.java:60) > > > at > > > org.apache.hadoop.io.WritableComparator.compare(WritableCompar > > > ator.java:91) > > > at > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTa > > > sk.java:494) > > > at org.apache.hadoop.util.QuickSort.fix(QuickSort.java:29) > > > at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:58) > > > at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:58) > > > ....the above line repeated 200x > > > > > > I defined writeablecomparable called GroupPair which simply > > > holds to Group objects, each of which contains two integers. > > > I fail to see how QuickSort could recurse 200+ times since > > > that would require an insanely large amount of entries , far > > > more then the 500 million that had been output at that point. > > > > > > How is this even possible? And what can be done to fix this? > > > -- > > > View this message in context: > > > http://www.nabble.com/Stack-Overflow-When-Running-Job-tp175935 > > > 94p17593594.html > > > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > > > > > > > > >
