[ https://issues.apache.org/jira/browse/HADOOP-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16630551#comment-16630551 ]
Vinayakumar B commented on HADOOP-15196: ---------------------------------------- Thanks for the fix [~brahmareddy]. 1. Patch fixes the said issue, except one case. i.e. If BuiltInGZipDecompressor is used, and size of trailing garbage is less than 10 bytes. Below change should be done in {{BuiltInGZipDecompressor#executeHeaderState()}} to handle this case as well. {code:java} @@ -253,8 +266,11 @@ private void executeHeaderState() throws IOException { if (state == GzipStateLabel.HEADER_BASIC) { int n = Math.min(userBufLen, 10-localBufOff); // (or 10-headerBytesRead) checkAndCopyBytesToLocal(n); // modifies userBufLen, etc. - if (localBufOff >= 10) { // should be strictly == + if (localBufOff > 0) { // should be strictly == processBasicHeader(); // sig, compression method, flagbits + if (ignoreTrailingGarbage) { + return; + } localBufOff = 0; // no further need for basic header state = GzipStateLabel.HEADER_EXTRA_FIELD; } {code} 2. Reset the {{newStream}} and {{ignoreTrailingGarbage}} flags if concatenated stream have valid bytes. Changes can be done in {{BuiltInGzipDecompressor#decompress()}} as below. {code:java} @@ -208,6 +216,11 @@ public synchronized int decompress(byte[] b, int off, int len) } catch (DataFormatException dfe) { throw new IOException(dfe.getMessage()); } + if (newSteam) { + //Reset if new stream have valid bytes + newSteam = false; + ignoreTrailingGarbage = false; + } crc.update(b, off, numAvailBytes); // CRC-32 is on _uncompressed_ data if (inflater.finished()) { state = GzipStateLabel.TRAILER_CRC; {code} 3. A test needs to be added to verify this. With both Native and Non-Native decompressors. Creating the gzip file with trailing garbage is very easy. Just create a gzip compressed file and append some extra bytes directly. > Zlib decompression fails when file having trailing garbage > ---------------------------------------------------------- > > Key: HADOOP-15196 > URL: https://issues.apache.org/jira/browse/HADOOP-15196 > Project: Hadoop Common > Issue Type: Bug > Affects Versions: 2.7.0 > Reporter: Brahma Reddy Battula > Assignee: Brahma Reddy Battula > Priority: Major > Attachments: HADOOP-15196.patch > > > *When file has trailing garbage gzip will ignore.* > {noformat} > gzip -d 2018011309-js.rishenglipin.com.gz > gzip: 2018011309-js.rishenglipin.com.gz: decompression OK, trailing garbage > ignored > {noformat} > *when we use same file and decompress,we got following.* > {noformat} > 2018-01-13 14:23:43,151 | WARN | task-result-getter-3 | Lost task 0.0 in > stage 345.0 (TID 5686, node-core-gyVYT, executor 3): java.io.IOException: > unknown compression method > at > org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native > Method) > at > org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:225) > at > org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org