Unfortunately the SequenceFile demands native codecs for gzip to read back, and in current stables, natives compilation for mac doesn't seem to be working (Mac/Windows aren't much supported platforms yet). I can only think of a local VM quick-solution to this presently.
On Tue, Feb 14, 2012 at 3:37 AM, Scott Farrar <[email protected]> wrote: > Harsh, > > Thanks, that was why the native libs were not being loaded -- my cluster is > Linux, but I was submitting the command from a Mac. > > Is there any way to force Hadoop to use the java Codec classes, to avoid this > native-library dependency? > > Thanks so much for your help!!! > Scott Farrar > > > On Feb 13, 2012, at 12:43 PM, Harsh J wrote: > >> Scott, >> >> The linux native libraries are only loaded if your platform is Linux >> and if the binaries are compatible with the architecture. Could you >> try the same command under Linux (VM or otherwise)? >> >> On Tue, Feb 14, 2012 at 2:09 AM, Scott Farrar <[email protected]> wrote: >>> I'm trying to use "hadoop fs -text <sequencefile>" to print the contents of >>> a sequence file via the command line. This works fine when the sequence >>> file is uncompressed, but when I turn on compression: >>> ... >>> job.setOutputKeyClass(Text.class); >>> job.setOutputValueClass(Text.class); >>> job.setOutputFormatClass(SequenceFileOutputFormat.class); >>> SequenceFileOutputFormat.setCompressOutput(job, true); >>> SequenceFileOutputFormat.setOutputCompressorClass(job, >>> GzipCodec.class); >>> SequenceFileOutputFormat.setOutputCompressionType(job, >>> CompressionType.BLOCK); >>> ... >>> >>> I get the following: >>> >>> 12/02/13 11:08:59 WARN util.NativeCodeLoader: Unable to load native-hadoop >>> library for your platform... using builtin-java classes where applicable >>> 12/02/13 11:08:59 INFO compress.CodecPool: Got brand-new decompressor >>> text: null >>> >>> Some questions about the error message: >>> >>> (1) I have verified that the native-hadoop library is in >>> $HADOOP_HOME//lib/native/Linux-i386-32/libhadoop.so. I am curious as to >>> why can't Hadoop load it? The native library isn't necessary for my >>> purposes -- I don't need native-level decompression performance, I'm just >>> trying to manually spot-check my data. I'm just curious about this. >>> >>> (2) The message "text:null" suggests to me a NullPointerException being >>> thrown. But I'm pretty sure there are no nulls in my data, because I can >>> turn off compression (comment out the last three lines above), run "hadoop >>> fs -text <sequencefile>", and see the data I expect. Is there some other >>> way I can verify that my data is not the cause of the problem? >>> >>> Any pointers or suggestions you may be able to provide would be greatly >>> appreciated. >>> >>> Thank you, >>> Scott Farrar >>> >> >> >> >> -- >> Harsh J >> Customer Ops. Engineer >> Cloudera | http://tiny.cloudera.com/about > > Scott Farrar > [email protected] > > > > -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about
