I encountered similar problems many times too, especially the input data is compressed. I had to raise the heapsize around 700MB to avoid oom problems in the mappers.
Runping > -----Original Message----- > From: Devaraj Das [mailto:[EMAIL PROTECTED] > Sent: Friday, December 28, 2007 3:28 AM > To: hadoop-user@lucene.apache.org > Subject: RE: question on Hadoop configuration for non cpu intensive jobs - > 0.15.1 > > I am also interested in the test demonstrating OOM for large split sizes > (if > this is true then it is indeed a bug). Sort & Spill-to-disk should happen > as > soon as io.sort.mb amount of key/value data is collected. I am assuming > that > you didn't change (increased) the value of io.sort.mb when you increased > the > split size.. > > Thanks, > Devaraj > > > -----Original Message----- > > From: Ted Dunning [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, December 26, 2007 4:31 AM > > To: hadoop-user@lucene.apache.org > > Subject: Re: question on Hadoop configuration for non cpu > > intensive jobs - 0.15.1 > > > > > > > > This sounds like a bug. > > > > The memory requirements for hadoop itself shouldn't change > > with the split size. At the very least, it should adapt > > correctly to whatever the memory limits are. > > > > Can you build a version of your program that works from > > random data so that you can file a bug? If you contact me > > off-line, I can help build a random data generator that > > matches your input reasonably well. > > > > > > On 12/25/07 2:52 PM, "Jason Venner" <[EMAIL PROTECTED]> wrote: > > > > > My mapper in this case is the identity mapper, and the reducer gets > > > about 10 values per key and makes a collect decision based > > on the data > > > in the values. > > > The reducer is very close to a no-op, and uses very little > > additional > > > memory than the values. > > > > > > I believe the problem is in the amount of buffering in the > > output files. > > > > > > The quandary we have is the jobs run very poorly with the standard > > > input split size as the mean time to finishing a split is > > very small, > > > vrs gigantic memory requirements for large split sizes. > > > > > > Time to play with parameters again ... since the answer > > doesn't appear > > > to be in working memory for the list. > > > > > > > > > > > > Ted Dunning wrote: > > >> What are your mappers doing that they run out of memory? Or is it > > >> your reducers? > > >> > > >> Often, you can write this sort of program so that you don't have > > >> higher memory requirements for larger splits. > > >> > > >> > > >> On 12/25/07 1:52 PM, "Jason Venner" <[EMAIL PROTECTED]> wrote: > > >> > > >> > > >>> We have tried reducing the number of splits by increasing > > the block > > >>> sizes to 10x and 5x 64meg, but then we constantly have > > out of memory > > >>> errors and timeouts. At this point each jvm is getting 768M and I > > >>> can't readily allocate more without dipping into swap. > > >>> > > >> > > >> > > > >