[
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371694#comment-16371694
]
Sean Mackrory commented on HADOOP-6852:
---------------------------------------
Actually one last thing: I'm not big on the idea of binary files checked into
source control. Where did these come from? Since these are tests that were
commented out at some point, I suspect you just restored them from before, but
we should make sure that's documented. Ideally we'd document the source and how
they were generated. I wonder if it's possible that generating them at run-time
would cause problems to get missed because we're testing recently compressed
files and not files compressed with an old implementation. Both should work. So
overall I'm okay committing this as-is, but I'd like to document where the
binaries came from.
> apparent bug in concatenated-bzip2 support (decoding)
> -----------------------------------------------------
>
> Key: HADOOP-6852
> URL: https://issues.apache.org/jira/browse/HADOOP-6852
> Project: Hadoop Common
> Issue Type: Bug
> Components: io
> Affects Versions: 0.22.0
> Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15
> Reporter: Greg Roelofs
> Assignee: Zsolt Venczel
> Priority: Major
> Attachments: HADOOP-6852.01.patch, HADOOP-6852.02.patch,
> HADOOP-6852.03.patch, HADOOP-6852.04.patch
>
>
> The following simplified code (manually picked out of testMoreBzip2() in
> https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch)
> triggers a "java.io.IOException: bad block header" in
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock(
> CBZip2InputStream.java:527):
> {noformat}
> JobConf jobConf = new JobConf(defaultConf);
> CompressionCodec bzip2 = new BZip2Codec();
> ReflectionUtils.setConf(bzip2, jobConf);
> localFs.delete(workDir, true);
> // copy multiple-member test file to HDFS
> String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension();
> Path fnLocal2 = new
> Path(System.getProperty("test.concat.data","/tmp"),fn2);
> Path fnHDFS2 = new Path(workDir, fn2);
> localFs.copyFromLocalFile(fnLocal2, fnHDFS2);
> FileInputFormat.setInputPaths(jobConf, workDir);
> final FileInputStream in2 = new FileInputStream(fnLocal2.toString());
> CompressionInputStream cin2 = bzip2.createInputStream(in2);
> LineReader in = new LineReader(cin2);
> Text out = new Text();
> int numBytes, totalBytes=0, lineNum=0;
> while ((numBytes = in.readLine(out)) > 0) {
> ++lineNum;
> totalBytes += numBytes;
> }
> in.close();
> {noformat}
> The specified file is also included in the H-6835 patch linked above, and
> some additional debug output is included in the commented-out test loop
> above. (Only in the linked, "v4" version of the patch, however--I'm about to
> remove the debug stuff for checkin.)
> It's possible I've done something completely boneheaded here, but the file,
> at least, checks out in a subsequent set of subtests and with stock bzip2
> itself. Only the code above is problematic; it reads through the first
> concatenated chunk (17 lines of text) just fine but chokes on the header of
> the second one. Altogether, the test file contains 84 lines of text and 4
> concatenated bzip2 files.
> (It's possible this is a mapreduce issue rather than common, but note that
> the identical gzip test works fine. Possibly it's related to the
> stream-vs-decompressor dichotomy, though; intentionally not supported?)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]