[
https://issues.apache.org/jira/browse/MAPREDUCE-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407237#comment-13407237
]
Robert Joseph Evans commented on MAPREDUCE-4379:
------------------------------------------------
Devaraj, could you upmerge the patch. It no longer applies to trunk.
For the most part the patch looks OK to me. It seems to fix the issue at hand.
However, I have been reading through the code and I find a few things a bit
disturbing. I don't really have a problem with your fix, but I think if we
refactored the code it would make things a lot cleaner. Because this is a
blocker I am inclined to just check it in, on condition that we mark
LocalDirAllocator.removeContext as deprecated and limited private for MapReduce
and file a separate JIRA to clean things up. Also this is only an issue if the
DefaultContainerExecutor is enabled, because when the LinuxContainerExecutor is
used the problematic code runs in a separate process that releases everything
when the localization is done.
The way the code in LocalDirAllocator is written there are really only two
reasons for the static cache. First to avoid doing extra file system calls
when creating the LocalDirAllocator. The second is to keep track of the last
dir written to for round robin access. In the case of the App Cache this is
never going to save any FS calls, because there is 1 and only 1
LocalDirAllocator created per application, and the key to the cache is
different for each application. For the User cache this would save a bit,
except for a bug in the code that makes the LocalDirAllocator cache key for the
user cache separate for all applications (and making the leak much worse).
{code} this.userDirs =
new LocalDirAllocator(String.format(USERCACHE_CTXT_FMT, appId));
...
conf.setStrings(String.format(USERCACHE_CTXT_FMT, appId), usersFileCacheDirs);
{code}
should be
{code} this.userDirs =
new LocalDirAllocator(String.format(USERCACHE_CTXT_FMT, user));
...
conf.setStrings(String.format(USERCACHE_CTXT_FMT, user), usersFileCacheDirs);
{code}
I would prefer to see the LocalDirAllocator cache have a maximum size, and have
it remove entires in LRU order when it grows too big. This way we do not have
to worry about removing entries we no longer need, and the worst thing that
will happen is that if the cache is too small we will do some extra work, and
potentially have disks written to in a more random order, instead of pure round
robin.
> Node Manager throws java.lang.OutOfMemoryError: Java heap space due to
> org.apache.hadoop.fs.LocalDirAllocator.contexts
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-4379
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4379
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, nodemanager
> Affects Versions: 0.23.3, 2.0.0-alpha, 3.0.0
> Reporter: Devaraj K
> Assignee: Devaraj K
> Priority: Blocker
> Attachments: MAPREDUCE-4379.patch
>
>
> {code:xml}
> Exception in thread "Container Monitor" java.lang.OutOfMemoryError: Java heap
> space
> at java.io.BufferedReader.<init>(BufferedReader.java:80)
> at java.io.BufferedReader.<init>(BufferedReader.java:91)
> at
> org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:410)
> at
> org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:171)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:389)
> Exception in thread "LocalizerRunner for
> container_1340690914008_10890_01_000003" java.lang.OutOfMemoryError: Java
> heap space
> at java.util.Arrays.copyOfRange(Arrays.java:3209)
> at java.lang.String.<init>(String.java:215)
> at
> com.sun.org.apache.xerces.internal.xni.XMLString.toString(XMLString.java:185)
> at
> com.sun.org.apache.xerces.internal.parsers.AbstractDOMParser.characters(AbstractDOMParser.java:1188)
> at
> com.sun.org.apache.xerces.internal.xinclude.XIncludeHandler.characters(XIncludeHandler.java:1084)
> at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:464)
> at
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
> at
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
> at
> com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
> at
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:235)
> at
> com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
> at
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1738)
> at
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1689)
> at
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1635)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:722)
> at
> org.apache.hadoop.conf.Configuration.setStrings(Configuration.java:1300)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.initDirs(ContainerLocalizer.java:375)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:127)
> at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:103)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:862)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira