Re: Working with my gzipped sequence file

Edward Capriolo Mon, 19 Apr 2010 07:54:48 -0700

On Fri, Apr 16, 2010 at 2:15 PM, Edward Capriolo <[email protected]>wrote:


> at org.apache.hadoop.mapred.
>
> SequenceFileAsTextInputFormat.getRecordReader(SequenceFileAsTextInputFormat.java:43)
>     at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:296)
>     at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:311)
>     ... 21 more
>
>
> The compression being used here - gzip - is not suitable for splitting of
> the input files. That could be the reason why you are seeing this exception.
> Can you try using a different compression scheme such as bzip2, or perhaps
> by not compressing the files at all?
>
>
> 1) can I just set the split size VERY VERY high thus causing hive never to
> split this files? My files were produced from a map reduce program so they
> are already split very small. I really do not want to have to force a change
> upstream.
>
> 2) From the other post the key/value of the sequence file should be
> ByteWritable Text. Currently my key/values are text/text. and my data is the
> the Key...so
>
> I have already written my own SequenceRecordReader, but it is not working.
> but I am swapping the key and the value. So I am thinking:
>
> 1. For key emit a dummy ByteWritable maybe 'A'
> 2. Write the key to the value
>
> Will this work? Are their other gotcha's here?
>
> Thank you,
> Edward
>

FYI the problem here is that hadoop NEEDS the native libraries to work with
GZIP block sequence compressed files. For whatever reason the dfs -text tool
can open then but mapreduce can't. Upstream should report error like!

Tring to load native libs...
cant do it falling back to...

Should be replaced with:

trying to load native libs...
FALLING BACK TO JAVA LIBS THAT WONT WORK ANYWAY!!!

Re: Working with my gzipped sequence file

Reply via email to