On Fri, Apr 16, 2010 at 2:15 PM, Edward Capriolo <[email protected]>wrote:
> at org.apache.hadoop.mapred. > > SequenceFileAsTextInputFormat.getRecordReader(SequenceFileAsTextInputFormat.java:43) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:296) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:311) > ... 21 more > > > The compression being used here - gzip - is not suitable for splitting of > the input files. That could be the reason why you are seeing this exception. > Can you try using a different compression scheme such as bzip2, or perhaps > by not compressing the files at all? > > > 1) can I just set the split size VERY VERY high thus causing hive never to > split this files? My files were produced from a map reduce program so they > are already split very small. I really do not want to have to force a change > upstream. > > 2) From the other post the key/value of the sequence file should be > ByteWritable Text. Currently my key/values are text/text. and my data is the > the Key...so > > I have already written my own SequenceRecordReader, but it is not working. > but I am swapping the key and the value. So I am thinking: > > 1. For key emit a dummy ByteWritable maybe 'A' > 2. Write the key to the value > > Will this work? Are their other gotcha's here? > > Thank you, > Edward > FYI the problem here is that hadoop NEEDS the native libraries to work with GZIP block sequence compressed files. For whatever reason the dfs -text tool can open then but mapreduce can't. Upstream should report error like! Tring to load native libs... cant do it falling back to... Should be replaced with: trying to load native libs... FALLING BACK TO JAVA LIBS THAT WONT WORK ANYWAY!!!
