[jira] [Commented] (HIVE-11043) ORC split strategies should adapt based on number of files

Gopal V (JIRA) Tue, 23 Jun 2015 22:40:21 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598893#comment-14598893
 ]


Gopal V commented on HIVE-11043:
--------------------------------

[~prasanth_j]: sure, looks like errors when reading footers for the 1 file/1 
split case.

The error is actually

{code}
Caused by: java.lang.IndexOutOfBoundsException: Index: 0
        at java.util.Collections$EmptyList.get(Collections.java:3212)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240)
        at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getColumnIndicesFromNames(ReaderImpl.java:651)
        at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getRawDataSizeOfColumns(ReaderImpl.java:634)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:938)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:847)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:713)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
{code}

> ORC split strategies should adapt based on number of files
> ----------------------------------------------------------
>
>                 Key: HIVE-11043
>                 URL: https://issues.apache.org/jira/browse/HIVE-11043
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Gopal V
>             Fix For: 2.0.0
>
>         Attachments: HIVE-11043.1.patch, HIVE-11043.2.patch
>
>
> ORC split strategies added in HIVE-10114 chose strategies based on average 
> file size. It would be beneficial to choose a different strategy based on 
> number of files as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11043) ORC split strategies should adapt based on number of files

Reply via email to