Guo Ruijing created HADOOP-10196:
------------------------------------
Summary: Bzip2Codec Uncompress cannot work
Key: HADOOP-10196
URL: https://issues.apache.org/jira/browse/HADOOP-10196
Project: Hadoop Common
Issue Type: Bug
Components: io
Affects Versions: 2.2.0
Reporter: Guo Ruijing
Bzip2Codec Uncompress cannot work.
1. Compress Sample file:
[hadoop@localhost ~]$ cat StreamCompressor.java
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.compress.CompressionOutputStream;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.ReflectionUtils;
public class StreamCompressor {
public static void main(String[] args) throws Exception
{ String codecClassname = args[0]; Class<?> codecClass =
Class.forName(codecClassname); Configuration conf = new Configuration();
CompressionCodec codec = (CompressionCodec)
ReflectionUtils.newInstance(codecClass, conf); CompressionOutputStream out =
codec.createOutputStream(System.out); IOUtils.copyBytes(System.in, out, 4096,
false); out.finish(); }
}
2. Uncompress Sample file:
[hadoop@localhost ~]$ cat StreamUncompressor.java
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.compress.CompressionOutputStream;
import org.apache.hadoop.io.compress.CompressionInputStream;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.ReflectionUtils;
public class StreamUncompressor {
public static void main(String[] args) throws Exception
{ String codecClassname = args[0]; Class<?> codecClass =
Class.forName(codecClassname); Configuration conf = new Configuration();
CompressionCodec codec = (CompressionCodec)
ReflectionUtils.newInstance(codecClass, conf); CompressionInputStream in =
codec.createInputStream(System.in); IOUtils.copyBytes(in, System.out, 4096,
false); in.close(); }
}
2. How to compile/run
1) javac -classpath
/usr/lib/gphd/hadoop/hadoop-common-2.0.5-alpha-gphd-2.1.1.0.jar
StreamCompressor.java
2) javac -classpath
/usr/lib/gphd/hadoop/hadoop-common-2.0.5-alpha-gphd-2.1.1.0.jar
StreamUncompressor.java
3) jar -cvf Stream.jar StreamCompressor.class StreamUncompressor.class
4) rm -rf /tmp/my.txt.bz2 && echo abc > /tmp/my.txt && bzip2 /tmp/my.txt && cat
/tmp/my.txt.bz2 | hadoop jar ./Stream.jar StreamUncompressor
org.apache.hadoop.io.compress.BZip2Codec
5) echo "text" | hadoop jar ./Stream.jar StreamCompressor
org.apache.hadoop.io.compress.BZip2Codec | bzcat
3. Test Result
>From test, hadoop doesn't support native bzip2 and java bzip2.
1) hadoop support bzip2 uncompress.
rm -rf /tmp/my.txt.bz2 && echo abc > /tmp/my.txt && bzip2 /tmp/my.txt && cat
/tmp/my.txt.bz2 | hadoop jar ./Stream.jar StreamUncompressor
org.apache.hadoop.io.compress.BZip2Codec
13/12/17 03:58:20 WARN bzip2.Bzip2Factory: Failed to load/initialize
native-bzip2 library system-native, will use pure-Java version
abc <<< expect
2) bzip2 compress cannot work as following:
a) [hadoop@localhost hadoop]$ echo "text" | hadoop jar ./Stream.jar
StreamCompressor org.apache.hadoop.io.compress.BZip2Codec
13/12/17 04:00:59 WARN bzip2.Bzip2Factory: Failed to load/initialize
native-bzip2 library system-native, will use pure-Java version
BZ <<<<< not expect
b) [hadoop@localhost hadoop]$ echo "text" | hadoop jar ./Stream.jar
StreamCompressor org.apache.hadoop.io.compress.BZip2Codec | bzcat
13/12/17 04:01:31 WARN bzip2.Bzip2Factory: Failed to load/initialize
native-bzip2 library system-native, will use pure-Java version
bzcat: Compressed file ends unexpectedly;
perhaps it is corrupted? Possible reason follows.
bzcat: Invalid argument
Input file = (stdin), output file = (stdout)
It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)