Re: OutOfMemoryError with map jobs

Chris Douglas Sat, 06 Sep 2008 14:58:53 -0700

From the stack trace you provided, your OOM is probably due toHADOOP-3931, which is fixed in 0.17.2. It occurs when the deserializedkey in an outputted record exactly fills the serialization buffer thatcollects map outputs, causing an allocation as large as the size ofthat buffer. It causes an extra spill, an OOM exception if the taskJVM has a max heap size too small to mask the bug, and will miss thecombiner if you've defined one, but it won't drop records.

However, I was wondering: are these hard architectural limits? Saythat I wanted to emit 25,000 maps for a single input record, wouldthat mean that I will require huge amounts of (virtual) memory? Inother words, what exactly is the reason that increasing the numberof emitted maps per input record causes an OutOfMemoryError ?

Do you mean the number of output records per input record in the map?The memory allocated for collecting records out of the map is (mostly)fixed at the size defined in io.sort.mb. The ratio of input records tooutput records does not affect the collection and sort. The number ofoutput records can sometimes influence the memory requirements, butnot significantly. -C

Re: OutOfMemoryError with map jobs

Reply via email to