They vary a lot, also depends on the application. Most of them are tiny (< 1kb), but there are outlayers going into 10's of MB.
One application had a maximum of 30 MB per value and ran into OOM, with a fan factor of 65 (from the logs: 2008-12-08 00:36:56,575 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 65 segments left of total size: 2478699273 bytes) indicating that at one time nearly all values on the heap were from the outlayers. That's why I though there is some systematic ordering. How are records of equal key compared on the merge heap? Are by any chance smaller values processed first? Thanks, -Christian On 12/8/08 10:55 AM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote: > Christian, > > On Dec 7, 2008, at 11:29 PM, Christian Kunz wrote: > >> Since running with hadoop-0.18 we have many more problems with >> running out >> of memory during the final merge process in the reduce phase, >> especially >> when dealing with a lot of records with the same key. >> > > Would you have any data on the sizes of keys/values? > > Arun >
