[ 
https://issues.apache.org/jira/browse/HADOOP-9419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved HADOOP-9419.
-----------------------------------------

    Resolution: Won't Fix

Never mind.  I created a patch, and it is completely useless in fixing this 
problem.  The tasks still OOM because the codec itself is so small and the 
MergeManager creates new codecs so quickly that on a job with lots of reduces 
it literally uses up all of the address space with direct byte buffers.  Some 
of the processes get killed by the NM for going over the virtual address space 
before they OOM. We could try and have the CodecPool detect that the codec is 
doing the wrong thing and "correct" it for the codec, but that is too heavy 
handed in my opinion.
                
> CodecPool should avoid OOMs with buggy codecs
> ---------------------------------------------
>
>                 Key: HADOOP-9419
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9419
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Robert Joseph Evans
>
> I recently found a bug in the gpl compression libraries that was causing map 
> tasks for a particular job to OOM.
> https://github.com/omalley/hadoop-gpl-compression/issues/3
> Now granted it does not make a lot of sense for a job to use the LzopCodec 
> for map output compression over the LzoCodec, but arguably other codecs could 
> be doing similar things and causing the same sort of memory leaks.  I propose 
> that we do a sanity check when creating a new decompressor/compressor.  If 
> the codec newly created object does not match the value from getType... it 
> should turn off caching for that Codec.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to