My mapper in this case is the identity mapper, and the reducer gets about 10 values per key and makes a collect decision based on the data in the values. The reducer is very close to a no-op, and uses very little additional memory than the values.

I believe the problem is in the amount of buffering in the output files.

The quandary we have is the jobs run very poorly with the standard input split size as the mean time to finishing a split is very small, vrs gigantic memory requirements for large split sizes.

Time to play with parameters again ... since the answer doesn't appear to be in working memory for the list.



Ted Dunning wrote:
What are your mappers doing that they run out of memory?  Or is it your
reducers?

Often, you can write this sort of program so that you don't have higher
memory requirements for larger splits.


On 12/25/07 1:52 PM, "Jason Venner" <[EMAIL PROTECTED]> wrote:

We have tried reducing the number of splits by increasing the block
sizes to 10x and 5x 64meg, but then we constantly have out of memory
errors and timeouts. At this point each jvm is getting 768M and I can't
readily allocate more without dipping into swap.

Reply via email to