They vary a lot, also depends on the application.
Most of them are tiny (< 1kb), but there are outlayers going into 10's of
MB.

One application had a maximum of 30 MB per value and ran into OOM, with a
fan factor of 65 (from the logs:

2008-12-08 00:36:56,575 INFO org.apache.hadoop.mapred.Merger: Down to the
last merge-pass, with 65 segments left of total size: 2478699273 bytes)

indicating that at one time nearly all values on the heap were from the
outlayers.

That's why I though there is some systematic ordering. How are records of
equal key compared on the merge heap? Are by any chance smaller values
processed first?

Thanks,
-Christian




On 12/8/08 10:55 AM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote:

> Christian,
> 
> On Dec 7, 2008, at 11:29 PM, Christian Kunz wrote:
> 
>> Since running with hadoop-0.18 we have many more problems with
>> running out
>> of memory during the final merge process in the reduce phase,
>> especially
>> when dealing with a lot of records with the same key.
>> 
> 
> Would you have any data on the sizes of keys/values?
> 
> Arun
> 

Reply via email to