I am also interested in the test demonstrating OOM for large split sizes (if
this is true then it is indeed a bug). Sort & Spill-to-disk should happen as
soon as io.sort.mb amount of key/value data is collected. I am assuming that
you didn't change (increased) the value of io.sort.mb when you increased the
split size..

Thanks,
Devaraj

> -----Original Message-----
> From: Ted Dunning [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, December 26, 2007 4:31 AM
> To: hadoop-user@lucene.apache.org
> Subject: Re: question on Hadoop configuration for non cpu 
> intensive jobs - 0.15.1
> 
> 
> 
> This sounds like a bug.
> 
> The memory requirements for hadoop itself shouldn't change 
> with the split size.  At the very least, it should adapt 
> correctly to whatever the memory limits are.
> 
> Can you build a version of your program that works from 
> random data so that you can file a bug?  If you contact me 
> off-line, I can help build a random data generator that 
> matches your input reasonably well.
> 
> 
> On 12/25/07 2:52 PM, "Jason Venner" <[EMAIL PROTECTED]> wrote:
> 
> > My mapper in this case is the identity mapper, and the reducer gets 
> > about 10 values per key and makes a collect decision based 
> on the data 
> > in the values.
> > The reducer is very close to a no-op, and uses very little 
> additional 
> > memory than the values.
> > 
> > I believe the problem is in the amount of buffering in the 
> output files.
> > 
> > The quandary we have is the jobs run very poorly with the standard 
> > input split size as the mean time to finishing a split is 
> very small, 
> > vrs gigantic memory requirements for large split sizes.
> > 
> > Time to play with parameters again ... since the answer 
> doesn't appear 
> > to be in working memory for the list.
> > 
> > 
> > 
> > Ted Dunning wrote:
> >> What are your mappers doing that they run out of memory?  Or is it 
> >> your reducers?
> >> 
> >> Often, you can write this sort of program so that you don't have 
> >> higher memory requirements for larger splits.
> >> 
> >> 
> >> On 12/25/07 1:52 PM, "Jason Venner" <[EMAIL PROTECTED]> wrote:
> >> 
> >>   
> >>> We have tried reducing the number of splits by increasing 
> the block 
> >>> sizes to 10x and 5x 64meg, but then we constantly have 
> out of memory 
> >>> errors and timeouts. At this point each jvm is getting 768M and I 
> >>> can't readily allocate more without dipping into swap.
> >>>     
> >> 
> >>   
> 
> 

Reply via email to