Re: question on Hadoop configuration for non cpu intensive jobs - 0.15.1

Jason Venner Tue, 25 Dec 2007 14:57:56 -0800

My mapper in this case is the identity mapper, and the reducer getsabout 10 values per key and makes a collect decision based on the datain the values.The reducer is very close to a no-op, and uses very little additionalmemory than the values.


I believe the problem is in the amount of buffering in the output files.

The quandary we have is the jobs run very poorly with the standard inputsplit size as the mean time to finishing a split is very small, vrsgigantic memory requirements for large split sizes.

Time to play with parameters again ... since the answer doesn't appearto be in working memory for the list.




Ted Dunning wrote:

What are your mappers doing that they run out of memory?  Or is it your
reducers?

Often, you can write this sort of program so that you don't have higher
memory requirements for larger splits.


On 12/25/07 1:52 PM, "Jason Venner" <[EMAIL PROTECTED]> wrote:

We have tried reducing the number of splits by increasing the block
sizes to 10x and 5x 64meg, but then we constantly have out of memory
errors and timeouts. At this point each jvm is getting 768M and I can't
readily allocate more without dipping into swap.

Re: question on Hadoop configuration for non cpu intensive jobs - 0.15.1

Reply via email to