[
https://issues.apache.org/jira/browse/HADOOP-15911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941313#comment-16941313
]
Jacques Nadeau commented on HADOOP-15911:
-----------------------------------------
I believe this occurs when Orc is being read off of S3 using zero copy path.
> Over-eager allocation in ByteBufferUtil.fallbackRead
> ----------------------------------------------------
>
> Key: HADOOP-15911
> URL: https://issues.apache.org/jira/browse/HADOOP-15911
> Project: Hadoop Common
> Issue Type: Bug
> Components: common
> Reporter: Vanco Buca
> Priority: Major
>
> The heap-memory path of ByteBufferUtil.fallbackRead ([see master branch code
> here|[https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ByteBufferUtil.java#L95])]
> massively overallocates memory when the underlying input stream returns data
> in smaller chunks. This happens on a regular basis when using the S3 input
> stream as input.
> The behavior is an O(N^2)-ish. In a recent debug session, we were trying to
> read 6MB, but getting 16K at a time. The code would:
> * allocate 16M, use the first 16K
> * allocate 16M - 16K, use the first 16K of that
> * allocate 16M - 32K, use the first 16K of that
> * (etc)
> The patch is simple. Here's the text version of the patch:
> {code}
> @@ -88,10 +88,17 @@ public final class ByteBufferUtil {
> buffer.flip();
> } else {
> buffer.clear();
> - int nRead = stream.read(buffer.array(),
> - buffer.arrayOffset(), maxLength);
> - if (nRead >= 0) {
> - buffer.limit(nRead);
> + int totalRead = 0;
> + while (totalRead < maxLength) {
> + final int nRead = stream.read(buffer.array(),
> + buffer.arrayOffset() + totalRead, maxLength - totalRead);
> + if (nRead <= 0) {
> + break;
> + }
> + totalRead += nRead;
> + }
> + if (totalRead >= 0) {
> + buffer.limit(totalRead);
> success = true;
> }
> }
> {code}
> so, essentially, do the same thing that the code in the direct memory path is
> doing
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]