at org.apache.hadoop.mapred.
SequenceFileAsTextInputFormat.getRecordReader(SequenceFileAsTextInputFormat.java:43)
at
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:296)
at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:311)
... 21 more
The compression being used here - gzip - is not suitable for splitting of
the input files. That could be the reason why you are seeing this exception.
Can you try using a different compression scheme such as bzip2, or perhaps
by not compressing the files at all?
1) can I just set the split size VERY VERY high thus causing hive never to
split this files? My files were produced from a map reduce program so they
are already split very small. I really do not want to have to force a change
upstream.
2) From the other post the key/value of the sequence file should be
ByteWritable Text. Currently my key/values are text/text. and my data is the
the Key...so
I have already written my own SequenceRecordReader, but it is not working.
but I am swapping the key and the value. So I am thinking:
1. For key emit a dummy ByteWritable maybe 'A'
2. Write the key to the value
Will this work? Are their other gotcha's here?
Thank you,
Edward