[
https://issues.apache.org/jira/browse/TEZ-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222362#comment-15222362
]
Jonathan Eagles commented on TEZ-3195:
--------------------------------------
While this patch is verified to make the byte buffers unreachable in the heap,
they are still not always garbage collected. Still tracking down the second
issue while this patch can use some feedback to verify the patch is not
breaking API assumptions.
> TezMerger OOM: unreserve called while memory still held
> -------------------------------------------------------
>
> Key: TEZ-3195
> URL: https://issues.apache.org/jira/browse/TEZ-3195
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jonathan Eagles
> Assignee: Jonathan Eagles
> Attachments: TEZ-3195.1-branch-0.7.patch, TEZ-3195.1.patch
>
>
> When the reader is closed in MergeQueue#adjustPriorityQueue, the byte buffer
> is still held in several places in the code while unreserve is called. In the
> case below, the Fetcher was trying to fetch a nearly 100MB map output which
> exposed this race condition.
> {noformat}
> Caused by: java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
> at
> org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput.<init>(MapOutput.java:75)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput.createMemoryMapOutput(MapOutput.java:124)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.unconditionalReserve(MergeManager.java:437)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.reserve(MergeManager.java:427)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyMapOutput(FetcherOrderedGrouped.java:481)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:286)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:176)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run(FetcherOrderedGrouped.java:191)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)