[
https://issues.apache.org/jira/browse/HIVE-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602613#comment-14602613
]
Demeter Sztanko commented on HIVE-11031:
----------------------------------------
Hello [~prasanth_j], my MR jobs are getting this error when concatenating ORC
files:
{code}
java.io.IOException: java.io.IOException: java.lang.IndexOutOfBoundsException:
Index: 0
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:226)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:136)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:230)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:210)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 0
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:105)
at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:224)
... 11 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 0
at java.util.Collections$EmptyList.get(Collections.java:3212)
at
org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.nextStripe(OrcFileStripeMergeRecordReader.java:82)
at
org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.next(OrcFileStripeMergeRecordReader.java:71)
at
org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.next(OrcFileStripeMergeRecordReader.java:31)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
... 15 more
2015-06-26 08:24:19,248 INFO org.apache.hadoop.mapred.Task: Runnning cleanup
for the task
{code}
Is this failure a result of the bug described in this ticket or that can be a
different problem?
> ORC concatenation of old files can fail while merging column statistics
> -----------------------------------------------------------------------
>
> Key: HIVE-11031
> URL: https://issues.apache.org/jira/browse/HIVE-11031
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0, 2.0.0
> Reporter: Prasanth Jayachandran
> Assignee: Prasanth Jayachandran
> Priority: Critical
> Fix For: 1.2.1
>
> Attachments: HIVE-11031-branch-1.0.patch, HIVE-11031.2.patch,
> HIVE-11031.3.patch, HIVE-11031.4.patch, HIVE-11031.patch
>
>
> Column statistics in ORC are optional protobuf fields. Old ORC files might
> not have statistics for newly added types like decimal, date, timestamp etc.
> But column statistics merging assumes column statistics exists for these
> types and invokes merge. For example, merging of TimestampColumnStatistics
> directly casts the received ColumnStatistics object without doing instanceof
> check. If the ORC file contains time stamp column statistics then this will
> work else it will throw ClassCastException.
> Also, the file merge operator swallows the exception.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)