RE: question on Hadoop configuration for non cpu intensive jobs - 0.15.1

Runping Qi Fri, 28 Dec 2007 19:00:00 -0800


I encountered similar problems many times too, especially the input data
is compressed.
I had to raise the heapsize around 700MB to avoid oom problems in the
mappers.


Runping


> -----Original Message-----
> From: Devaraj Das [mailto:[EMAIL PROTECTED]
> Sent: Friday, December 28, 2007 3:28 AM
> To: hadoop-user@lucene.apache.org
> Subject: RE: question on Hadoop configuration for non cpu intensive
jobs -
> 0.15.1
> 
> I am also interested in the test demonstrating OOM for large split
sizes
> (if
> this is true then it is indeed a bug). Sort & Spill-to-disk should
happen
> as
> soon as io.sort.mb amount of key/value data is collected. I am
assuming
> that
> you didn't change (increased) the value of io.sort.mb when you
increased
> the
> split size..
> 
> Thanks,
> Devaraj
> 
> > -----Original Message-----
> > From: Ted Dunning [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, December 26, 2007 4:31 AM
> > To: hadoop-user@lucene.apache.org
> > Subject: Re: question on Hadoop configuration for non cpu
> > intensive jobs - 0.15.1
> >
> >
> >
> > This sounds like a bug.
> >
> > The memory requirements for hadoop itself shouldn't change
> > with the split size.  At the very least, it should adapt
> > correctly to whatever the memory limits are.
> >
> > Can you build a version of your program that works from
> > random data so that you can file a bug?  If you contact me
> > off-line, I can help build a random data generator that
> > matches your input reasonably well.
> >
> >
> > On 12/25/07 2:52 PM, "Jason Venner" <[EMAIL PROTECTED]> wrote:
> >
> > > My mapper in this case is the identity mapper, and the reducer
gets
> > > about 10 values per key and makes a collect decision based
> > on the data
> > > in the values.
> > > The reducer is very close to a no-op, and uses very little
> > additional
> > > memory than the values.
> > >
> > > I believe the problem is in the amount of buffering in the
> > output files.
> > >
> > > The quandary we have is the jobs run very poorly with the standard
> > > input split size as the mean time to finishing a split is
> > very small,
> > > vrs gigantic memory requirements for large split sizes.
> > >
> > > Time to play with parameters again ... since the answer
> > doesn't appear
> > > to be in working memory for the list.
> > >
> > >
> > >
> > > Ted Dunning wrote:
> > >> What are your mappers doing that they run out of memory?  Or is
it
> > >> your reducers?
> > >>
> > >> Often, you can write this sort of program so that you don't have
> > >> higher memory requirements for larger splits.
> > >>
> > >>
> > >> On 12/25/07 1:52 PM, "Jason Venner" <[EMAIL PROTECTED]> wrote:
> > >>
> > >>
> > >>> We have tried reducing the number of splits by increasing
> > the block
> > >>> sizes to 10x and 5x 64meg, but then we constantly have
> > out of memory
> > >>> errors and timeouts. At this point each jvm is getting 768M and
I
> > >>> can't readily allocate more without dipping into swap.
> > >>>
> > >>
> > >>
> >
> >

RE: question on Hadoop configuration for non cpu intensive jobs - 0.15.1

Reply via email to