Has anyone else seen out of memory errors at the start of combiner tasks?

W.P. McNeill Fri, 27 May 2011 14:58:26 -0700

I have a job that uses an identity mapper and the same code for both the
combiner and the reducer.  In a small percentage of combiner tasks, after a
few seconds I get errors that look like this:


FATAL mapred.TaskTracker: Error running child : java.lang.OutOfMemoryError:
Java heap space
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:781)
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:524)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

Those tasks fail, though then subsequently restart and complete
successfully.  Eventually the whole job completes successfully.
 Nevertheless this happens consistently enough that it is clearly a problem
with my code rather than a transient glitch on my cluster.

>From the stack it looks like the out of memory error is happening before any
of my combiner code has had a chance to run.  If I don't specify a combiner
class and run everything through reducers, there are no out of memory errors
and everything works fine.

Obviously I have a bug, but I'm wondering if anyone has seen this particular
failure mode before and has insights into why it is happening.  My
hypothesis is that I have some memory usage within the combiner/reducer code
that doesn't scale to the largest inputs my job is getting. This is a
problem for combiners and not reducers because more combiners than reducers
run on a single task tracker node. The problematic job is not the one that's
failing during initialization but one that is running at the same time on
the same node and chewing up all the memory.  Does this hypothesis sound
plausible?

Has anyone else seen out of memory errors at the start of combiner tasks?

Reply via email to