[ https://issues.apache.org/jira/browse/HIVE-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602613#comment-14602613 ]
Demeter Sztanko commented on HIVE-11031: ---------------------------------------- Hello [~prasanth_j], my MR jobs are getting this error when concatenating ORC files: {code} java.io.IOException: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 0 at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:226) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:136) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:230) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:210) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 0 at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:105) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:224) ... 11 more Caused by: java.lang.IndexOutOfBoundsException: Index: 0 at java.util.Collections$EmptyList.get(Collections.java:3212) at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.nextStripe(OrcFileStripeMergeRecordReader.java:82) at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.next(OrcFileStripeMergeRecordReader.java:71) at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.next(OrcFileStripeMergeRecordReader.java:31) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) ... 15 more 2015-06-26 08:24:19,248 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task {code} Is this failure a result of the bug described in this ticket or that can be a different problem? > ORC concatenation of old files can fail while merging column statistics > ----------------------------------------------------------------------- > > Key: HIVE-11031 > URL: https://issues.apache.org/jira/browse/HIVE-11031 > Project: Hive > Issue Type: Bug > Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0, 2.0.0 > Reporter: Prasanth Jayachandran > Assignee: Prasanth Jayachandran > Priority: Critical > Fix For: 1.2.1 > > Attachments: HIVE-11031-branch-1.0.patch, HIVE-11031.2.patch, > HIVE-11031.3.patch, HIVE-11031.4.patch, HIVE-11031.patch > > > Column statistics in ORC are optional protobuf fields. Old ORC files might > not have statistics for newly added types like decimal, date, timestamp etc. > But column statistics merging assumes column statistics exists for these > types and invokes merge. For example, merging of TimestampColumnStatistics > directly casts the received ColumnStatistics object without doing instanceof > check. If the ORC file contains time stamp column statistics then this will > work else it will throw ClassCastException. > Also, the file merge operator swallows the exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)