[
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206026#comment-13206026
]
stack commented on HBASE-5387:
------------------------------
Any reason for hardcoding of 32K for buffer size:
+ ((Configurable)codec).getConf().setInt("io.file.buffer.size", 32 * 1024);
Give this an initial reasonable size?
+ compressedByteStream = new ByteArrayOutputStream();
So, we'll keep around the largest thing we ever wrote into this
ByteArrayOutputStream? Should we resize it or something from time to time? Or
I suppose we can just wait till its a prob?
Is the gzip stuff brittle? The header can be bigger than 10bytes I suppose
(spec allows extensions IIRC) but I suppose its safe because we presume java or
underlying native compression.
Good stuff Mikhail. +1 on patch.
> Reuse compression streams in HFileBlock.Writer
> ----------------------------------------------
>
> Key: HBASE-5387
> URL: https://issues.apache.org/jira/browse/HBASE-5387
> Project: HBase
> Issue Type: Bug
> Reporter: Mikhail Bautin
> Assignee: Mikhail Bautin
> Attachments: Fix-deflater-leak-2012-02-10_18_48_45.patch
>
>
> We need to to reuse compression streams in HFileBlock.Writer instead of
> allocating them every time. The motivation is that when using Java's built-in
> implementation of Gzip, we allocate a new GZIPOutputStream object and an
> associated native data structure every time we create a compression stream.
> The native data structure is only deallocated in the finalizer. This is one
> suspected cause of recent TestHFileBlock failures on Hadoop QA:
> https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira