[
https://issues.apache.org/jira/browse/HADOOP-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mike Percy updated HADOOP-8522:
-------------------------------
Attachment: HADOOP-8522-2.patch
I am attaching a patch to make the behavior of non-native resetState()
consistent with native resetState(), which will make them both compliant with
RFC1952 and "gunzip".
Implementation totally lifted from HBase:
https://svn.apache.org/viewvc/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/ReusableStreamGzipCodec.java?revision=1342856&view=markup
I added one unit test which simply tests that the output is readable with
GZipInputStream, and one in which I had to comment-out the assert() because JDK
GZipInputStream cannot handle multi-member gzip files. I'm open to suggestions
for improving the unit test... it looks like HBase actually stores the expected
bytes and requires an exact match in their test.
Testing done: manual inspection that the data generated via the 2nd unit test
creates headers, trailers, crc32 checksums, and lengths corresponding to the
two members included. Also verified that the output of unit test 2 is readable
with "gunzip" and that the output matches the provided input.
> ResetableGzipOutputStream creates invalid gzip files when finish() and
> resetState() are used
> --------------------------------------------------------------------------------------------
>
> Key: HADOOP-8522
> URL: https://issues.apache.org/jira/browse/HADOOP-8522
> Project: Hadoop Common
> Issue Type: Bug
> Components: io
> Affects Versions: 1.0.3, 2.0.0-alpha
> Reporter: Mike Percy
> Attachments: HADOOP-8522-2.patch
>
>
> ResetableGzipOutputStream creates invalid gzip files when finish() and
> resetState() are used. The issue is that finish() flushes the compressor
> buffer and writes the gzip CRC32 + data length trailer. After that,
> resetState() does not repeat the gzip header, but simply starts writing more
> deflate-compressed data. The resultant files are not readable by the Linux
> "gunzip" tool. ResetableGzipOutputStream should write valid multi-member gzip
> files.
> The gzip format is specified in [RFC
> 1952|https://tools.ietf.org/html/rfc1952].
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira