Hello Chris, > From the stack trace you provided, your OOM is probably due to > HADOOP-3931, which is fixed in 0.17.2. It occurs when the deserialized > key in an outputted record exactly fills the serialization buffer that > collects map outputs, causing an allocation as large as the size of > that buffer. It causes an extra spill, an OOM exception if the task > JVM has a max heap size too small to mask the bug, and will miss the > combiner if you've defined one, but it won't drop records.
Ok thanks for that information. I guess that means I will have to upgrade. :-) > > However, I was wondering: are these hard architectural limits? Say > > that I wanted to emit 25,000 maps for a single input record, would > > that mean that I will require huge amounts of (virtual) memory? In > > other words, what exactly is the reason that increasing the number > > of emitted maps per input record causes an OutOfMemoryError ? > > Do you mean the number of output records per input record in the map? > The memory allocated for collecting records out of the map is (mostly) > fixed at the size defined in io.sort.mb. The ratio of input records to > output records does not affect the collection and sort. The number of > output records can sometimes influence the memory requirements, but > not significantly. -C Ok, so I should not have to worry about this too much! Thanks for the reply and information! Regards, Leon Mergen
