[ 
https://issues.apache.org/jira/browse/HADOOP-14376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003486#comment-16003486
 ] 

Jason Lowe commented on HADOOP-14376:
-------------------------------------

Thanks for updating the patch!

I still think BZip2CompressionOutputStream.close() should be doing more than 
just calling super.close().  BZip2CompressionOutputStream has an "output" field 
that is private and instantiated by the class, yet it never calls the close() 
method on it.  While it's true that _today_ calling output.close() won't do 
anything useful because underlying resources are closed/freed by other 
entities, that may not always be the case in the _future_.  Someone could come 
along later and update CBZip2OutputStream such that it becomes critical to call 
its close() method, and failure to do so means we start leaking at that point.

The following:
{code}
  @Override
  public void close() throws IOException {
    if (!closed) {
      try {
        super.close();
      }
      finally {
        closed = true;
      }
    }
  }
{code}
can be simplified to:
{code}
  @Override
  public void close() throws IOException {
    if (!closed) {
      closed = true;
      super.close();
    }
  }
{code}
although even that has a code smell.  Why are we protecting the parent's close 
method from being idempotent on redundant close?  The parent's method should 
already be doing that, which precludes the need to have an override at all 
since there's nothing else to do in the close method other than call the 
parent's version.  The closed check logic should be moved into the parent 
rather than having the child do it on behalf of the parent.


> Memory leak when reading a compressed file using the native library
> -------------------------------------------------------------------
>
>                 Key: HADOOP-14376
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14376
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: common, io
>    Affects Versions: 2.7.0
>            Reporter: Eli Acherkan
>            Assignee: Eli Acherkan
>         Attachments: Bzip2MemoryTester.java, HADOOP-14376.001.patch, 
> HADOOP-14376.002.patch, HADOOP-14376.003.patch, log4j.properties
>
>
> Opening and closing a large number of bzip2-compressed input streams causes 
> the process to be killed on OutOfMemory when using the native bzip2 library.
> Our initial analysis suggests that this can be caused by 
> {{DecompressorStream}} overriding the {{close()}} method, and therefore 
> skipping the line from its parent: 
> {{CodecPool.returnDecompressor(trackedDecompressor)}}. When the decompressor 
> object is a {{Bzip2Decompressor}}, its native {{end()}} method is never 
> called, and the allocated memory isn't freed.
> If this analysis is correct, the simplest way to fix this bug would be to 
> replace {{in.close()}} with {{super.close()}} in {{DecompressorStream}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to