[
https://issues.apache.org/jira/browse/HIVE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598893#comment-14598893
]
Gopal V commented on HIVE-11043:
--------------------------------
[~prasanth_j]: sure, looks like errors when reading footers for the 1 file/1
split case.
The error is actually
{code}
Caused by: java.lang.IndexOutOfBoundsException: Index: 0
at java.util.Collections$EmptyList.get(Collections.java:3212)
at
org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240)
at
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getColumnIndicesFromNames(ReaderImpl.java:651)
at
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getRawDataSizeOfColumns(ReaderImpl.java:634)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:938)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:847)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:713)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
{code}
> ORC split strategies should adapt based on number of files
> ----------------------------------------------------------
>
> Key: HIVE-11043
> URL: https://issues.apache.org/jira/browse/HIVE-11043
> Project: Hive
> Issue Type: Bug
> Affects Versions: 2.0.0
> Reporter: Prasanth Jayachandran
> Assignee: Gopal V
> Fix For: 2.0.0
>
> Attachments: HIVE-11043.1.patch, HIVE-11043.2.patch
>
>
> ORC split strategies added in HIVE-10114 chose strategies based on average
> file size. It would be beneficial to choose a different strategy based on
> number of files as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)