[
https://issues.apache.org/jira/browse/TEZ-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227042#comment-15227042
]
Siddharth Seth commented on TEZ-3195:
-------------------------------------
[~jeagles] - could you please provide a little more information on the cases in
which the buffer is held ?
>From walking through the code, and looking at the patch - I wasn't able to
>understand the exact problem.
Think I'm missing certain aspects; this is what I think happens.
The MapOutput should be eligible for GC as soon as it is selected by
createInMemorySegments
The InMemoryReader should become a candidate of GC after the reader.close()
(Segment.close) is called in adjustPriorityQueue
We don't hold on to individual parts of the buffer while writing the output -
that's a copy.
> TezMerger OOM: unreserve called while memory still held
> -------------------------------------------------------
>
> Key: TEZ-3195
> URL: https://issues.apache.org/jira/browse/TEZ-3195
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jonathan Eagles
> Assignee: Jonathan Eagles
> Attachments: TEZ-3195.1-branch-0.7.patch, TEZ-3195.1.patch,
> TEZ-3195.2-branch-0.7.patch, TEZ-3195.2.patch
>
>
> When the reader is closed in MergeQueue#adjustPriorityQueue, the byte buffer
> is still held in several places in the code while unreserve is called. In the
> case below, the Fetcher was trying to fetch a nearly 100MB map output which
> exposed this race condition.
> {noformat}
> Caused by: java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
> at
> org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput.<init>(MapOutput.java:75)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput.createMemoryMapOutput(MapOutput.java:124)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.unconditionalReserve(MergeManager.java:437)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.reserve(MergeManager.java:427)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyMapOutput(FetcherOrderedGrouped.java:481)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:286)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:176)
> at
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run(FetcherOrderedGrouped.java:191)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)