Hello Chris,

>  From the stack trace you provided, your OOM is probably due to
> HADOOP-3931, which is fixed in 0.17.2. It occurs when the deserialized
> key in an outputted record exactly fills the serialization buffer that
> collects map outputs, causing an allocation as large as the size of
> that buffer. It causes an extra spill, an OOM exception if the task
> JVM has a max heap size too small to mask the bug, and will miss the
> combiner if you've defined one, but it won't drop records.

Ok thanks for that information. I guess that means I will have to upgrade. :-)

> > However, I was wondering: are these hard architectural limits? Say
> > that I wanted to emit 25,000 maps for a single input record, would
> > that mean that I will require huge amounts of (virtual) memory? In
> > other words, what exactly is the reason that increasing the number
> > of emitted maps per input record causes an OutOfMemoryError ?
>
> Do you mean the number of output records per input record in the map?
> The memory allocated for collecting records out of the map is (mostly)
> fixed at the size defined in io.sort.mb. The ratio of input records to
> output records does not affect the collection and sort. The number of
> output records can sometimes influence the memory requirements, but
> not significantly. -C

Ok, so I should not have to worry about this too much! Thanks for the reply and 
information!

Regards,

Leon Mergen

Reply via email to