This sounds like a bug.

The memory requirements for hadoop itself shouldn't change with the split
size.  At the very least, it should adapt correctly to whatever the memory
limits are.

Can you build a version of your program that works from random data so that
you can file a bug?  If you contact me off-line, I can help build a random
data generator that matches your input reasonably well.


On 12/25/07 2:52 PM, "Jason Venner" <[EMAIL PROTECTED]> wrote:

> My mapper in this case is the identity mapper, and the reducer gets
> about 10 values per key and makes a collect decision based on the data
> in the values.
> The reducer is very close to a no-op, and uses very little additional
> memory than the values.
> 
> I believe the problem is in the amount of buffering in the output files.
> 
> The quandary we have is the jobs run very poorly with the standard input
> split size as the mean time to finishing a split is very small, vrs
> gigantic memory requirements for large split sizes.
> 
> Time to play with parameters again ... since the answer doesn't appear
> to be in working memory for the list.
> 
> 
> 
> Ted Dunning wrote:
>> What are your mappers doing that they run out of memory?  Or is it your
>> reducers?
>> 
>> Often, you can write this sort of program so that you don't have higher
>> memory requirements for larger splits.
>> 
>> 
>> On 12/25/07 1:52 PM, "Jason Venner" <[EMAIL PROTECTED]> wrote:
>> 
>>   
>>> We have tried reducing the number of splits by increasing the block
>>> sizes to 10x and 5x 64meg, but then we constantly have out of memory
>>> errors and timeouts. At this point each jvm is getting 768M and I can't
>>> readily allocate more without dipping into swap.
>>>     
>> 
>>   

Reply via email to