at org.apache.hadoop.mapred.
SequenceFileAsTextInputFormat.getRecordReader(SequenceFileAsTextInputFormat.java:43)
    at
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:296)
    at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:311)
    ... 21 more


The compression being used here - gzip - is not suitable for splitting of
the input files. That could be the reason why you are seeing this exception.
Can you try using a different compression scheme such as bzip2, or perhaps
by not compressing the files at all?


1) can I just set the split size VERY VERY high thus causing hive never to
split this files? My files were produced from a map reduce program so they
are already split very small. I really do not want to have to force a change
upstream.

2) From the other post the key/value of the sequence file should be
ByteWritable Text. Currently my key/values are text/text. and my data is the
the Key...so

I have already written my own SequenceRecordReader, but it is not working.
but I am swapping the key and the value. So I am thinking:

1. For key emit a dummy ByteWritable maybe 'A'
2. Write the key to the value

Will this work? Are their other gotcha's here?

Thank you,
Edward

Reply via email to