[jira] [Commented] (MAPREDUCE-4379) Node Manager throws java.lang.OutOfMemoryError: Java heap space due to org.apache.hadoop.fs.LocalDirAllocator.contexts

Robert Joseph Evans (JIRA) Thu, 05 Jul 2012 09:14:36 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407237#comment-13407237
 ]


Robert Joseph Evans commented on MAPREDUCE-4379:
------------------------------------------------

Devaraj, could you upmerge the patch. It no longer applies to trunk.

For the most part the patch looks OK to me.  It seems to fix the issue at hand. 
However, I have been reading through the code and I find a few things a bit 
disturbing.  I don't really have a problem with your fix, but I think if we 
refactored the code it would make things a lot cleaner.  Because this is a 
blocker I am inclined to just check it in, on condition that we mark 
LocalDirAllocator.removeContext as deprecated and limited private for MapReduce 
and file a separate JIRA to clean things up.  Also this is only an issue if the 
DefaultContainerExecutor is enabled, because when the LinuxContainerExecutor is 
used the problematic code runs in a separate process that releases everything 
when the localization is done.

The way the code in LocalDirAllocator is written there are really only two 
reasons for the static cache.  First to avoid doing extra file system calls 
when creating the LocalDirAllocator.  The second is to keep track of the last 
dir written to for round robin access.  In the case of the App Cache this is 
never going to save any FS calls, because there is 1 and only 1 
LocalDirAllocator created per application, and the key to the cache is 
different for each application.  For the User cache this would save a bit, 
except for a bug in the code that makes the LocalDirAllocator cache key for the 
user cache separate for all applications (and making the leak much worse). 

{code}    this.userDirs =
      new LocalDirAllocator(String.format(USERCACHE_CTXT_FMT, appId));
 ...
 conf.setStrings(String.format(USERCACHE_CTXT_FMT, appId), usersFileCacheDirs);
{code}

should be

{code}    this.userDirs =
      new LocalDirAllocator(String.format(USERCACHE_CTXT_FMT, user)); 
...
 conf.setStrings(String.format(USERCACHE_CTXT_FMT, user), usersFileCacheDirs);
{code}

I would prefer to see the LocalDirAllocator cache have a maximum size, and have 
it remove entires in LRU order when it grows too big.  This way we do not have 
to worry about removing entries we no longer need, and the worst thing that 
will happen is that if the cache is too small we will do some extra work, and 
potentially have disks written to in a more random order, instead of pure round 
robin.
                
> Node Manager throws java.lang.OutOfMemoryError: Java heap space due to 
> org.apache.hadoop.fs.LocalDirAllocator.contexts
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4379
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4379
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.3, 2.0.0-alpha, 3.0.0
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>            Priority: Blocker
>         Attachments: MAPREDUCE-4379.patch
>
>
> {code:xml}
> Exception in thread "Container Monitor" java.lang.OutOfMemoryError: Java heap 
> space
>       at java.io.BufferedReader.<init>(BufferedReader.java:80)
>       at java.io.BufferedReader.<init>(BufferedReader.java:91)
>       at 
> org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:410)
>       at 
> org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:171)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:389)
>       Exception in thread "LocalizerRunner for 
> container_1340690914008_10890_01_000003" java.lang.OutOfMemoryError: Java 
> heap space
>       at java.util.Arrays.copyOfRange(Arrays.java:3209)
>       at java.lang.String.<init>(String.java:215)
>       at 
> com.sun.org.apache.xerces.internal.xni.XMLString.toString(XMLString.java:185)
>       at 
> com.sun.org.apache.xerces.internal.parsers.AbstractDOMParser.characters(AbstractDOMParser.java:1188)
>       at 
> com.sun.org.apache.xerces.internal.xinclude.XIncludeHandler.characters(XIncludeHandler.java:1084)
>       at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:464)
>       at 
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
>       at 
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
>       at 
> com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
>       at 
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:235)
>       at 
> com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
>       at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
>       at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1738)
>       at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1689)
>       at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1635)
>       at org.apache.hadoop.conf.Configuration.set(Configuration.java:722)
>       at 
> org.apache.hadoop.conf.Configuration.setStrings(Configuration.java:1300)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.initDirs(ContainerLocalizer.java:375)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:127)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:103)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:862)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4379) Node Manager throws java.lang.OutOfMemoryError: Java heap space due to org.apache.hadoop.fs.LocalDirAllocator.contexts

Reply via email to