[
https://issues.apache.org/jira/browse/HADOOP-14376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003486#comment-16003486
]
Jason Lowe commented on HADOOP-14376:
-------------------------------------
Thanks for updating the patch!
I still think BZip2CompressionOutputStream.close() should be doing more than
just calling super.close(). BZip2CompressionOutputStream has an "output" field
that is private and instantiated by the class, yet it never calls the close()
method on it. While it's true that _today_ calling output.close() won't do
anything useful because underlying resources are closed/freed by other
entities, that may not always be the case in the _future_. Someone could come
along later and update CBZip2OutputStream such that it becomes critical to call
its close() method, and failure to do so means we start leaking at that point.
The following:
{code}
@Override
public void close() throws IOException {
if (!closed) {
try {
super.close();
}
finally {
closed = true;
}
}
}
{code}
can be simplified to:
{code}
@Override
public void close() throws IOException {
if (!closed) {
closed = true;
super.close();
}
}
{code}
although even that has a code smell. Why are we protecting the parent's close
method from being idempotent on redundant close? The parent's method should
already be doing that, which precludes the need to have an override at all
since there's nothing else to do in the close method other than call the
parent's version. The closed check logic should be moved into the parent
rather than having the child do it on behalf of the parent.
> Memory leak when reading a compressed file using the native library
> -------------------------------------------------------------------
>
> Key: HADOOP-14376
> URL: https://issues.apache.org/jira/browse/HADOOP-14376
> Project: Hadoop Common
> Issue Type: Bug
> Components: common, io
> Affects Versions: 2.7.0
> Reporter: Eli Acherkan
> Assignee: Eli Acherkan
> Attachments: Bzip2MemoryTester.java, HADOOP-14376.001.patch,
> HADOOP-14376.002.patch, HADOOP-14376.003.patch, log4j.properties
>
>
> Opening and closing a large number of bzip2-compressed input streams causes
> the process to be killed on OutOfMemory when using the native bzip2 library.
> Our initial analysis suggests that this can be caused by
> {{DecompressorStream}} overriding the {{close()}} method, and therefore
> skipping the line from its parent:
> {{CodecPool.returnDecompressor(trackedDecompressor)}}. When the decompressor
> object is a {{Bzip2Decompressor}}, its native {{end()}} method is never
> called, and the allocated memory isn't freed.
> If this analysis is correct, the simplest way to fix this bug would be to
> replace {{in.close()}} with {{super.close()}} in {{DecompressorStream}}.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]