[
https://issues.apache.org/jira/browse/MAPREDUCE-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262892#comment-15262892
]
Mariappan Asokan commented on MAPREDUCE-6628:
---------------------------------------------
Got rid of trailing spaces in the test and uploaded a new patch. The failing
tests seem to be unrelated to this patch.
> Potential memory leak in CryptoOutputStream
> -------------------------------------------
>
> Key: MAPREDUCE-6628
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6628
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: security
> Reporter: Mariappan Asokan
> Assignee: Mariappan Asokan
> Attachments: MAPREDUCE-6628.001.patch, MAPREDUCE-6628.002.patch,
> MAPREDUCE-6628.003.patch, MAPREDUCE-6628.004.patch, MAPREDUCE-6628.005.patch
>
>
> There is a potential memory leak in {{CryptoOutputStream.java.}} It
> allocates two direct byte buffers ({{inBuffer}} and {{outBuffer}}) that get
> freed when {{close()}} method is called. Most of the time, {{close()}}
> method is called. However, when writing to intermediate Map output file or
> the spill files in {{MapTask}}, {{close()}} is never called since calling so
> would close the underlying stream which is not desirable. There is a single
> underlying physical stream that contains multiple logical streams one per
> partition of Map output.
> By default the amount of memory allocated per byte buffer is 128 KB and so
> the total memory allocated is 256 KB, This may not sound much. However, if
> the number of partitions (or number of reducers) is large (in the hundreds)
> and/or there are spill files created in {{MapTask}}, this can grow into a few
> hundred MB.
> I can think of two ways to address this issue:
> h2. Possible Fix - 1
> According to JDK documentation:
> {quote}
> The contents of direct buffers may reside outside of the normal
> garbage-collected heap, and so their impact upon the memory footprint of an
> application might not be obvious. It is therefore recommended that direct
> buffers be allocated primarily for large, long-lived buffers that are subject
> to the underlying system's native I/O operations. In general it is best to
> allocate direct buffers only when they yield a measureable gain in program
> performance.
> {quote}
> It is not clear to me whether there is any benefit of allocating direct byte
> buffers in {{CryptoOutputStream.java}}. In fact, there is a slight CPU
> overhead in moving data from {{outBuffer}} to a temporary byte array as per
> the following code in {{CryptoOutputStream.java}}.
> {code}
> /*
> * If underlying stream supports {@link ByteBuffer} write in future, needs
> * refine here.
> */
> final byte[] tmp = getTmpBuf();
> outBuffer.get(tmp, 0, len);
> out.write(tmp, 0, len);
> {code}
> Even if the underlying stream supports direct byte buffer IO (or direct IO in
> OS parlance), it is not clear whether it will yield any measurable
> performance gain.
> The fix would be to allocate a ByteBuffer on the heap for inBuffer and wrap a
> byte array in a {{ByteBuffer}} for {{outBuffer}}. By the way, the
> {{inBuffer}} and {{outBuffer}} have to be {{ByteBuffer}} as demanded by the
> {{encrypt()}} method in {{Encryptor}}.
> h2. Possible Fix - 2
> Assuming that we want to keep the buffers as direct byte buffers, we can
> create a new constructor to {{CryptoOutputStream}} and pass a boolean flag
> {{ownOutputStream}} to indicate whether the underlying stream will be owned
> by {{CryptoOutputStream}}. If it is true, then calling the {{close()}} method
> will close the underlying stream. Otherwise, when {{close()}} is called only
> the direct byte buffers will be freed and the underlying stream will not be
> closed.
> The scope of changes for this fix will be somewhat wider. We need to modify
> {{MapTask.java}}, {{CryptoUtils.java}}, and {{CryptoFSDataOutputStream.java}}
> as well to pass the ownership flag mentioned above.
> I can post a patch for either of the above. I welcome any other ideas from
> developers to fix this issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)