[
https://issues.apache.org/jira/browse/HADOOP-14376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001020#comment-16001020
]
Jason Lowe commented on HADOOP-14376:
-------------------------------------
Thanks for the patch! The test failure is unrelated.
bq. Regarding BZip2Codec.BZip2CompressionOutputStream.close(), I removed the
overriding method altogether, because the superclass's close() method invokes
finish(). The finish() method handles internalReset() if needed, and also calls
output.finish(), which eliminates the need to call output.flush() or
output.close().
I'm not sure this is a net good change. Other maintainers will come along and
see that BZip2CompressionOutputStream never calls close() on one of its private
member streams which is usually a bug. Even if the close() ends up being
redundant today that doesn't mean it always will. The root cause for this JIRA
is a great example. Also I'm not seeing how the superclass's finish() method
ever ends up closing the out stream. I see it write some final bytes and sets
it to null, which in turn _prevents_ the close() method from trying to call
out.close(), so I'm wondering how the output stream normally gets closed.
For the BZip2CompressionInputStream change, if input.close() throws then we
won't call super.close() and we could leak some resources and won't return the
codec to the pool.
For the CompressorStream patch:
{code}
public void close() throws IOException {
if (!closed) {
try {
- finish();
+ super.close();
}
finally {
- out.close();
closed = true;
}
}
{code}
This is subtly different that the previous code because finish() can throw. In
the old code, finish() could throw and out.close() would still be called but
now we'll skip calling out.close() but still set closed=true so we can't retry
the close. This change was done by HADOOP-10526 but it looks like they missed
the same change for DecompressorStream. Similarly the
CompressionInputStream/CompressionOutputStream code won't return the codec to
the pool if finish() throws or the underlying stream's close() throws.
> Memory leak when reading a compressed file using the native library
> -------------------------------------------------------------------
>
> Key: HADOOP-14376
> URL: https://issues.apache.org/jira/browse/HADOOP-14376
> Project: Hadoop Common
> Issue Type: Bug
> Components: common, io
> Affects Versions: 2.7.0
> Reporter: Eli Acherkan
> Assignee: Eli Acherkan
> Attachments: Bzip2MemoryTester.java, HADOOP-14376.001.patch,
> log4j.properties
>
>
> Opening and closing a large number of bzip2-compressed input streams causes
> the process to be killed on OutOfMemory when using the native bzip2 library.
> Our initial analysis suggests that this can be caused by
> {{DecompressorStream}} overriding the {{close()}} method, and therefore
> skipping the line from its parent:
> {{CodecPool.returnDecompressor(trackedDecompressor)}}. When the decompressor
> object is a {{Bzip2Decompressor}}, its native {{end()}} method is never
> called, and the allocated memory isn't freed.
> If this analysis is correct, the simplest way to fix this bug would be to
> replace {{in.close()}} with {{super.close()}} in {{DecompressorStream}}.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]