Working with my gzipped sequence file

Edward Capriolo Fri, 16 Apr 2010 11:16:05 -0700

 at org.apache.hadoop.mapred.
SequenceFileAsTextInputFormat.getRecordReader(SequenceFileAsTextInputFormat.java:43)
    at
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:296)
    at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:311)
    ... 21 more



The compression being used here - gzip - is not suitable for splitting of
the input files. That could be the reason why you are seeing this exception.
Can you try using a different compression scheme such as bzip2, or perhaps
by not compressing the files at all?


1) can I just set the split size VERY VERY high thus causing hive never to
split this files? My files were produced from a map reduce program so they
are already split very small. I really do not want to have to force a change
upstream.

2) From the other post the key/value of the sequence file should be
ByteWritable Text. Currently my key/values are text/text. and my data is the
the Key...so

I have already written my own SequenceRecordReader, but it is not working.
but I am swapping the key and the value. So I am thinking:

1. For key emit a dummy ByteWritable maybe 'A'
2. Write the key to the value

Will this work? Are their other gotcha's here?

Thank you,
Edward

Working with my gzipped sequence file

Reply via email to