[ https://issues.apache.org/jira/browse/HIVE-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111429#comment-14111429 ]
Venkata Puneet Ravuri commented on HIVE-7886: --------------------------------------------- The data files are in correct RCFile format. When I run 'select *' on this table, the data is returned correctly. > Aggregation queries fail with RCFile based Hive tables with S3 storage > ---------------------------------------------------------------------- > > Key: HIVE-7886 > URL: https://issues.apache.org/jira/browse/HIVE-7886 > Project: Hive > Issue Type: Bug > Components: File Formats > Affects Versions: 0.13.1 > Reporter: Venkata Puneet Ravuri > > Aggregation queries on Hive tables which use RCFile format and S3 storage are > failing. > My setup is Hadoop 2.5.0 and Hive 0.13.1. > I create a table with following schema:- > CREATE EXTERNAL TABLE `testtable`( > `col1` string, > `col2` tinyint, > `col3` int, > `col4` float, > `col5` boolean, > `col6` smallint) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' > WITH SERDEPROPERTIES ( > 'serialization.format'='\t', > 'line.delim'='\n', > 'field.delim'='\t' > ) > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.RCFileInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat' > LOCATION > 's3n://<testbucket>/testtable'; > When I run 'select count(*) from testtable', it gives the following exception > stack:- > Error: java.io.IOException: java.io.IOException: java.io.EOFException: > Attempted to seek or read past the end of the file > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:256) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:171) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > Caused by: java.io.IOException: java.io.EOFException: Attempted to seek or > read past the end of the file > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:344) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:122) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:254) > ... 11 more > Caused by: java.io.EOFException: Attempted to seek or read past the end of > the file > at > org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:462) > at > org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411) > at > org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieve(Jets3tNativeFileSystemStore.java:234) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at org.apache.hadoop.fs.s3native.$Proxy17.retrieve(Unknown Source) > at > org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.seek(NativeS3FileSystem.java:205) > at > org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:96) > at > org.apache.hadoop.fs.BufferedFSInputStream.skip(BufferedFSInputStream.java:67) > at java.io.DataInputStream.skipBytes(DataInputStream.java:220) > at > org.apache.hadoop.hive.ql.io.RCFile$ValueBuffer.readFields(RCFile.java:739) > at > org.apache.hadoop.hive.ql.io.RCFile$Reader.currentValueBuffer(RCFile.java:1720) > at > org.apache.hadoop.hive.ql.io.RCFile$Reader.getCurrentRow(RCFile.java:1898) > at > org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:149) > at > org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:44) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339) > ... 15 more -- This message was sent by Atlassian JIRA (v6.2#6252)