Re: Compressed sequence files and "hadoop fs -text "

Harsh J Mon, 13 Feb 2012 14:35:55 -0800

Unfortunately the SequenceFile demands native codecs for gzip to read
back, and in current stables, natives compilation for mac doesn't seem
to be working (Mac/Windows aren't much supported platforms yet). I can
only think of a local VM quick-solution to this presently.


On Tue, Feb 14, 2012 at 3:37 AM, Scott Farrar <[email protected]> wrote:
> Harsh,
>
> Thanks, that was why the native libs were not being loaded -- my cluster is 
> Linux, but I was submitting the command from a Mac.
>
> Is there any way to force Hadoop to use the java Codec classes, to avoid this 
> native-library dependency?
>
> Thanks so much for your help!!!
> Scott Farrar
>
>
> On Feb 13, 2012, at 12:43 PM, Harsh J wrote:
>
>> Scott,
>>
>> The linux native libraries are only loaded if your platform is Linux
>> and if the binaries are compatible with the architecture. Could you
>> try the same command under Linux (VM or otherwise)?
>>
>> On Tue, Feb 14, 2012 at 2:09 AM, Scott Farrar <[email protected]> wrote:
>>> I'm trying to use "hadoop fs -text <sequencefile>" to print the contents of 
>>> a sequence file via the command line.  This works fine when the sequence 
>>> file is uncompressed, but when I turn on compression:
>>>        ...
>>>        job.setOutputKeyClass(Text.class);
>>>        job.setOutputValueClass(Text.class);
>>>        job.setOutputFormatClass(SequenceFileOutputFormat.class);
>>>        SequenceFileOutputFormat.setCompressOutput(job, true);
>>>        SequenceFileOutputFormat.setOutputCompressorClass(job, 
>>> GzipCodec.class);
>>>        SequenceFileOutputFormat.setOutputCompressionType(job, 
>>> CompressionType.BLOCK);
>>>        ...
>>>
>>> I get the following:
>>>
>>> 12/02/13 11:08:59 WARN util.NativeCodeLoader: Unable to load native-hadoop 
>>> library for your platform... using builtin-java classes where applicable
>>> 12/02/13 11:08:59 INFO compress.CodecPool: Got brand-new decompressor
>>> text: null
>>>
>>> Some questions about the error message:
>>>
>>> (1) I have verified that the native-hadoop library is in 
>>> $HADOOP_HOME//lib/native/Linux-i386-32/libhadoop.so.  I am curious as to 
>>> why can't Hadoop load it?  The native library isn't necessary for my 
>>> purposes -- I don't need native-level decompression performance, I'm just 
>>> trying to manually spot-check my data.  I'm just curious about this.
>>>
>>> (2) The message "text:null" suggests to me a NullPointerException being 
>>> thrown.  But I'm pretty sure there are no nulls in my data, because I can 
>>> turn off compression (comment out the last three lines above), run "hadoop 
>>> fs -text <sequencefile>", and see the data I expect.  Is there some other 
>>> way I can verify that my data is not the cause of the problem?
>>>
>>> Any pointers or suggestions you may be able to provide would be greatly 
>>> appreciated.
>>>
>>> Thank you,
>>> Scott Farrar
>>>
>>
>>
>>
>> --
>> Harsh J
>> Customer Ops. Engineer
>> Cloudera | http://tiny.cloudera.com/about
>
>  Scott Farrar
> [email protected]
>
>
>
>



-- 
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about

Re: Compressed sequence files and "hadoop fs -text "

Reply via email to