Hi,

I have a mapred job that has about 60 million input records, and groups them 
into 1 or 2 element unit (that is a reducer always gets 1 or 2 records with the 
same key). 

I have 2Gb of RAM set up for each map/reduce task and some of the reduce tasks 
fail with OutOfMemoryError. 
I've got a dump of one of the reduce task when it was close to OOM. It turned 
out that most of memory is consumed by  the 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread  class that 
holds  org.apache.hadoop.mapred.Merger$Segment objects that are still reachable 
(there are about 170 of them taking about 8Mb or retained size each).

Unfortunately, I'm not an expert in hadoop code, so I can't tell whether it's 
normal behavior or not. However, the common sense tells me that memory 
consumption is a bit too high.

Do you have any ideas/thoughts about the described issue?

Any pointers are highly appreciated

Vyacheslav

Reply via email to