Compressed sequence files and "hadoop fs -text "

Scott Farrar Mon, 13 Feb 2012 12:40:53 -0800

I'm trying to use "hadoop fs -text <sequencefile>" to print the contents of a 
sequence file via the command line.  This works fine when the sequence file is 
uncompressed, but when I turn on compression:
        ...
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        job.setOutputFormatClass(SequenceFileOutputFormat.class);
        SequenceFileOutputFormat.setCompressOutput(job, true);
        SequenceFileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
        SequenceFileOutputFormat.setOutputCompressionType(job, 
CompressionType.BLOCK);
        ...


I get the following:

12/02/13 11:08:59 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
12/02/13 11:08:59 INFO compress.CodecPool: Got brand-new decompressor
text: null

Some questions about the error message:

(1) I have verified that the native-hadoop library is in 
$HADOOP_HOME//lib/native/Linux-i386-32/libhadoop.so.  I am curious as to why 
can't Hadoop load it?  The native library isn't necessary for my purposes -- I 
don't need native-level decompression performance, I'm just trying to manually 
spot-check my data.  I'm just curious about this.

(2) The message "text:null" suggests to me a NullPointerException being thrown. 
 But I'm pretty sure there are no nulls in my data, because I can turn off 
compression (comment out the last three lines above), run "hadoop fs -text 
<sequencefile>", and see the data I expect.  Is there some other way I can 
verify that my data is not the cause of the problem?

Any pointers or suggestions you may be able to provide would be greatly 
appreciated.

Thank you,
Scott Farrar

Compressed sequence files and "hadoop fs -text "

Reply via email to