RE: Stack Overflow When Running Job

Runping Qi Mon, 09 Jun 2008 13:25:03 -0700

This is a known problem for 0.17.0:
https://issues.apache.org/jira/browse/HADOOP-3442


It should be fixed in 0.17.1

Runping


> -----Original Message-----
> From: Colin Freas [mailto:[EMAIL PROTECTED]
> Sent: Monday, June 09, 2008 12:56 PM
> To: [email protected]
> Subject: Re: Stack Overflow When Running Job
> 
> We were getting this exact same problem in a really simple MR job, on
> input
> produced from a known-working MR job.
> 
> It seemed to happen intermittently, and we couldn't figure out what
was up.
> In the end we solved the problem by increasing the number of maps (80
to
> 200, this is a 6 node, 12 code cluster).  Apparently, QuickSort can
have
> problems with big chunks of pre-sorted data.  Too much recursion, I
> believe.
> 
> This might not be what's going on with you, maybe you're on a cluster
of
> some other scale, but this worked for us (and in a setup with Hadoop
0.17.)
> 
> Good luck!
> 
> -Colin
> 
> On Mon, Jun 2, 2008 at 3:18 PM, Devaraj Das <[EMAIL PROTECTED]>
wrote:
> 
> > Hi, do you have a testcase that we can run to reproduce this?
Thanks!
> >
> > > -----Original Message-----
> > > From: jkupferman [mailto:[EMAIL PROTECTED]
> > > Sent: Monday, June 02, 2008 9:22 AM
> > > To: [email protected]
> > > Subject: Stack Overflow When Running Job
> > >
> > >
> > > Hi everyone,
> > > I have a job running that keeps failing with Stack Overflows
> > > and I really dont see how that is happening.
> > > The job runs for about 20-30 minutes before one task errors,
> > > then a few more error and it fails.
> > > I am running hadoop-17 and ive tried lowering these settings
> > > to no avail:
> > > io.sort.factor        50
> > > io.seqfile.sorter.recordlimit 500000
> > >
> > > java.io.IOException: Spill failed
> > >       at
> > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(
> > > MapTask.java:594)
> > >       at
> > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(
> > > MapTask.java:576)
> > >       at
java.io.DataOutputStream.writeInt(DataOutputStream.java:180)
> > >       at Group.write(Group.java:68)
> > >       at GroupPair.write(GroupPair.java:67)
> > >       at
> > > org.apache.hadoop.io.serializer.WritableSerialization$Writable
> > Serializer.serialize(WritableSerialization.java:90)
> > >       at
> > > org.apache.hadoop.io.serializer.WritableSerialization$Writable
> > Serializer.serialize(WritableSerialization.java:77)
> > >       at
> > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTa
> > > sk.java:434)
> > >       at MyMapper.map(MyMapper.java:27)
> > >       at MyMapper.map(MyMapper.java:10)
> > >       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> > >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> > >       at
> > >
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
> > > Caused by: java.lang.StackOverflowError
> > >       at java.io.DataInputStream.readInt(DataInputStream.java:370)
> > >       at Group.readFields(Group.java:62)
> > >       at GroupPair.readFields(GroupPair.java:60)
> > >       at
> > > org.apache.hadoop.io.WritableComparator.compare(WritableCompar
> > > ator.java:91)
> > >       at
> > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTa
> > > sk.java:494)
> > >       at org.apache.hadoop.util.QuickSort.fix(QuickSort.java:29)
> > >       at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:58)
> > >       at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:58)
> > > ....the above line repeated 200x
> > >
> > > I defined writeablecomparable called GroupPair which simply
> > > holds to Group objects, each of which contains two integers.
> > > I fail to see how QuickSort could recurse 200+ times since
> > > that would require an insanely large amount of entries , far
> > > more then the 500 million that had been output at that point.
> > >
> > > How is this even possible? And what can be done to fix this?
> > > --
> > > View this message in context:
> > > http://www.nabble.com/Stack-Overflow-When-Running-Job-tp175935
> > > 94p17593594.html
> > > Sent from the Hadoop core-user mailing list archive at Nabble.com.
> > >
> > >
> >
> >

RE: Stack Overflow When Running Job

Reply via email to