[
https://issues.apache.org/jira/browse/TEZ-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260890#comment-15260890
]
Jonathan Eagles commented on TEZ-3195:
--------------------------------------
I can shed a little more light on this. This is actually race condition with
accounting. However, the GC shouldn't give up too easily to recovery memory. To
address this issue, the job needs to give more new gen space so that it can
more easily garbage collect the unreachable memory.
In this case the user needed to change new ratio from 8 to 4 for a 1GB heap
size.
{noformat}
-XX:NewRatio=4
{noformat}
> TezMerger OOM: unreserve called while memory still held
> -------------------------------------------------------
>
> Key: TEZ-3195
> URL: https://issues.apache.org/jira/browse/TEZ-3195
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jonathan Eagles
> Assignee: Jonathan Eagles
> Attachments: TEZ-3195.1-branch-0.7.patch, TEZ-3195.1.patch,
> TEZ-3195.2-branch-0.7.patch, TEZ-3195.2.patch
>
>
> When the reader is closed in MergeQueue#adjustPriorityQueue, the byte buffer
> is still held in several places in the code while unreserve is called. In the
> case below, the Fetcher was trying to fetch a nearly 100MB map output which
> exposed this race condition.
> {noformat}
> Caused by: java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
> at
> org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput.<init>(MapOutput.java:75)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput.createMemoryMapOutput(MapOutput.java:124)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.unconditionalReserve(MergeManager.java:437)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.reserve(MergeManager.java:427)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyMapOutput(FetcherOrderedGrouped.java:481)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:286)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:176)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run(FetcherOrderedGrouped.java:191)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)