I'm trying to use "hadoop fs -text <sequencefile>" to print the contents of a
sequence file via the command line. This works fine when the sequence file is
uncompressed, but when I turn on compression:
...
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
SequenceFileOutputFormat.setCompressOutput(job, true);
SequenceFileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
SequenceFileOutputFormat.setOutputCompressionType(job,
CompressionType.BLOCK);
...
I get the following:
12/02/13 11:08:59 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
12/02/13 11:08:59 INFO compress.CodecPool: Got brand-new decompressor
text: null
Some questions about the error message:
(1) I have verified that the native-hadoop library is in
$HADOOP_HOME//lib/native/Linux-i386-32/libhadoop.so. I am curious as to why
can't Hadoop load it? The native library isn't necessary for my purposes -- I
don't need native-level decompression performance, I'm just trying to manually
spot-check my data. I'm just curious about this.
(2) The message "text:null" suggests to me a NullPointerException being thrown.
But I'm pretty sure there are no nulls in my data, because I can turn off
compression (comment out the last three lines above), run "hadoop fs -text
<sequencefile>", and see the data I expect. Is there some other way I can
verify that my data is not the cause of the problem?
Any pointers or suggestions you may be able to provide would be greatly
appreciated.
Thank you,
Scott Farrar