[jira] [Commented] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

Hudson (JIRA) Wed, 21 Feb 2018 12:39:00 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371951#comment-16371951
 ]


Hudson commented on HADOOP-6852:
--------------------------------

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13698 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13698/])
HADOOP-6852. apparent bug in concatenated-bzip2 support (decoding). 
(mackrorysd: rev 2bc3351eaf240ea685bcf5042d79f1554bf89e00)
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/BZip2Codec.java
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestConcatenatedCompressedInput.java
* (add) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/resources/testdata/testConcatThenCompress.txt.gz
* (add) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/resources/testdata/concat.bz2
* (add) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/resources/testdata/testCompressThenConcat.txt.gz
* (add) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/resources/testdata/concat.gz
* (add) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/resources/testdata/testConcatThenCompress.txt.bz2
* (add) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/resources/testdata/testCompressThenConcat.txt.bz2
* (edit) hadoop-client-modules/hadoop-client-minicluster/pom.xml


> apparent bug in concatenated-bzip2 support (decoding)
> -----------------------------------------------------
>
>                 Key: HADOOP-6852
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6852
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.22.0
>         Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15
>            Reporter: Greg Roelofs
>            Assignee: Zsolt Venczel
>            Priority: Major
>             Fix For: 3.2.0
>
>         Attachments: HADOOP-6852.01.patch, HADOOP-6852.02.patch, 
> HADOOP-6852.03.patch, HADOOP-6852.04.patch
>
>
> The following simplified code (manually picked out of testMoreBzip2() in 
> https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch)
>  triggers a "java.io.IOException: bad block header" in 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( 
> CBZip2InputStream.java:527):
> {noformat}
>     JobConf jobConf = new JobConf(defaultConf);
>     CompressionCodec bzip2 = new BZip2Codec();
>     ReflectionUtils.setConf(bzip2, jobConf);
>     localFs.delete(workDir, true);
>     // copy multiple-member test file to HDFS
>     String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension();
>     Path fnLocal2 = new 
> Path(System.getProperty("test.concat.data","/tmp"),fn2);
>     Path fnHDFS2  = new Path(workDir, fn2);
>     localFs.copyFromLocalFile(fnLocal2, fnHDFS2);
>     FileInputFormat.setInputPaths(jobConf, workDir);
>     final FileInputStream in2 = new FileInputStream(fnLocal2.toString());
>     CompressionInputStream cin2 = bzip2.createInputStream(in2);
>     LineReader in = new LineReader(cin2);
>     Text out = new Text();
>     int numBytes, totalBytes=0, lineNum=0;
>     while ((numBytes = in.readLine(out)) > 0) {
>       ++lineNum;
>       totalBytes += numBytes;
>     }
>     in.close();
> {noformat}
> The specified file is also included in the H-6835 patch linked above, and 
> some additional debug output is included in the commented-out test loop 
> above.  (Only in the linked, "v4" version of the patch, however--I'm about to 
> remove the debug stuff for checkin.)
> It's possible I've done something completely boneheaded here, but the file, 
> at least, checks out in a subsequent set of subtests and with stock bzip2 
> itself.  Only the code above is problematic; it reads through the first 
> concatenated chunk (17 lines of text) just fine but chokes on the header of 
> the second one.  Altogether, the test file contains 84 lines of text and 4 
> concatenated bzip2 files.
> (It's possible this is a mapreduce issue rather than common, but note that 
> the identical gzip test works fine.  Possibly it's related to the 
> stream-vs-decompressor dichotomy, though; intentionally not supported?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

Reply via email to