Guo Ruijing created HADOOP-10196: ------------------------------------ Summary: Bzip2Codec Uncompress cannot work Key: HADOOP-10196 URL: https://issues.apache.org/jira/browse/HADOOP-10196 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 2.2.0 Reporter: Guo Ruijing
Bzip2Codec Uncompress cannot work. 1. Compress Sample file: [hadoop@localhost ~]$ cat StreamCompressor.java import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.io.compress.CompressionOutputStream; import org.apache.hadoop.io.compress.CompressionCodec; import org.apache.hadoop.io.IOUtils; import org.apache.hadoop.util.ReflectionUtils; public class StreamCompressor { public static void main(String[] args) throws Exception { String codecClassname = args[0]; Class<?> codecClass = Class.forName(codecClassname); Configuration conf = new Configuration(); CompressionCodec codec = (CompressionCodec) ReflectionUtils.newInstance(codecClass, conf); CompressionOutputStream out = codec.createOutputStream(System.out); IOUtils.copyBytes(System.in, out, 4096, false); out.finish(); } } 2. Uncompress Sample file: [hadoop@localhost ~]$ cat StreamUncompressor.java import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.io.compress.CompressionOutputStream; import org.apache.hadoop.io.compress.CompressionInputStream; import org.apache.hadoop.io.compress.CompressionCodec; import org.apache.hadoop.io.IOUtils; import org.apache.hadoop.util.ReflectionUtils; public class StreamUncompressor { public static void main(String[] args) throws Exception { String codecClassname = args[0]; Class<?> codecClass = Class.forName(codecClassname); Configuration conf = new Configuration(); CompressionCodec codec = (CompressionCodec) ReflectionUtils.newInstance(codecClass, conf); CompressionInputStream in = codec.createInputStream(System.in); IOUtils.copyBytes(in, System.out, 4096, false); in.close(); } } 2. How to compile/run 1) javac -classpath /usr/lib/gphd/hadoop/hadoop-common-2.0.5-alpha-gphd-2.1.1.0.jar StreamCompressor.java 2) javac -classpath /usr/lib/gphd/hadoop/hadoop-common-2.0.5-alpha-gphd-2.1.1.0.jar StreamUncompressor.java 3) jar -cvf Stream.jar StreamCompressor.class StreamUncompressor.class 4) rm -rf /tmp/my.txt.bz2 && echo abc > /tmp/my.txt && bzip2 /tmp/my.txt && cat /tmp/my.txt.bz2 | hadoop jar ./Stream.jar StreamUncompressor org.apache.hadoop.io.compress.BZip2Codec 5) echo "text" | hadoop jar ./Stream.jar StreamCompressor org.apache.hadoop.io.compress.BZip2Codec | bzcat 3. Test Result >From test, hadoop doesn't support native bzip2 and java bzip2. 1) hadoop support bzip2 uncompress. rm -rf /tmp/my.txt.bz2 && echo abc > /tmp/my.txt && bzip2 /tmp/my.txt && cat /tmp/my.txt.bz2 | hadoop jar ./Stream.jar StreamUncompressor org.apache.hadoop.io.compress.BZip2Codec 13/12/17 03:58:20 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version abc <<< expect 2) bzip2 compress cannot work as following: a) [hadoop@localhost hadoop]$ echo "text" | hadoop jar ./Stream.jar StreamCompressor org.apache.hadoop.io.compress.BZip2Codec 13/12/17 04:00:59 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version BZ <<<<< not expect b) [hadoop@localhost hadoop]$ echo "text" | hadoop jar ./Stream.jar StreamCompressor org.apache.hadoop.io.compress.BZip2Codec | bzcat 13/12/17 04:01:31 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version bzcat: Compressed file ends unexpectedly; perhaps it is corrupted? Possible reason follows. bzcat: Invalid argument Input file = (stdin), output file = (stdout) It is possible that the compressed file(s) have become corrupted. You can use the -tvv option to test integrity of such files. You can use the `bzip2recover' program to attempt to recover data from undamaged sections of corrupted files. -- This message was sent by Atlassian JIRA (v6.1.5#6160)