[ 
https://issues.apache.org/jira/browse/HADOOP-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16630551#comment-16630551
 ] 

Vinayakumar B commented on HADOOP-15196:
----------------------------------------

Thanks for the fix [~brahmareddy].

1. Patch fixes the said issue, except one case. i.e. If BuiltInGZipDecompressor 
is used, and size of trailing garbage is less than 10 bytes.
 Below change should be done in 
{{BuiltInGZipDecompressor#executeHeaderState()}} to handle this case as well.
{code:java}
@@ -253,8 +266,11 @@ private void executeHeaderState() throws IOException {
     if (state == GzipStateLabel.HEADER_BASIC) {
       int n = Math.min(userBufLen, 10-localBufOff);  // (or 10-headerBytesRead)
       checkAndCopyBytesToLocal(n);  // modifies userBufLen, etc.
-      if (localBufOff >= 10) {      // should be strictly ==
+      if (localBufOff > 0) {      // should be strictly ==
         processBasicHeader();       // sig, compression method, flagbits
+        if (ignoreTrailingGarbage) {
+          return;
+        }
         localBufOff = 0;            // no further need for basic header
         state = GzipStateLabel.HEADER_EXTRA_FIELD;
       }
{code}
2. Reset the {{newStream}} and {{ignoreTrailingGarbage}} flags if concatenated 
stream have valid bytes.
 Changes can be done in {{BuiltInGzipDecompressor#decompress()}} as below.
{code:java}
@@ -208,6 +216,11 @@ public synchronized int decompress(byte[] b, int off, int 
len)
       } catch (DataFormatException dfe) {
         throw new IOException(dfe.getMessage());
       }
+      if (newSteam) {
+        //Reset if new stream have valid bytes
+        newSteam = false;
+        ignoreTrailingGarbage = false;
+      }
       crc.update(b, off, numAvailBytes);  // CRC-32 is on _uncompressed_ data
       if (inflater.finished()) {
         state = GzipStateLabel.TRAILER_CRC;
{code}
3. A test needs to be added to verify this. With both Native and Non-Native 
decompressors.
 Creating the gzip file with trailing garbage is very easy. Just create a gzip 
compressed file and append some extra bytes directly.

> Zlib decompression fails when file having trailing garbage
> ----------------------------------------------------------
>
>                 Key: HADOOP-15196
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15196
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Brahma Reddy Battula
>            Assignee: Brahma Reddy Battula
>            Priority: Major
>         Attachments: HADOOP-15196.patch
>
>
> *When file has trailing garbage gzip will ignore.*
> {noformat}
> gzip -d 2018011309-js.rishenglipin.com.gz
> gzip: 2018011309-js.rishenglipin.com.gz: decompression OK, trailing garbage 
> ignored
> {noformat}
>  *when we use same file and decompress,we got following.*
> {noformat}
> 2018-01-13 14:23:43,151 | WARN  | task-result-getter-3 | Lost task 0.0 in 
> stage 345.0 (TID 5686, node-core-gyVYT, executor 3): java.io.IOException: 
> unknown compression method
>         at 
> org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native 
> Method)
>         at 
> org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:225)
>         at 
> org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)
>         at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to