[ 
https://issues.apache.org/jira/browse/HIVE-11102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610808#comment-14610808
 ] 

Prasanth Jayachandran commented on HIVE-11102:
----------------------------------------------

[~sershe] and [~gopalv].. getRawDataSizeOfColumns was never intended to be used 
inside hive at the time of writing. Its added as a pure convenience method for 
tools using ORC outside of hive like pig et. al. The reason being all other 
tools will write the actual column names but hive writes internal names which 
is weird. Hive uses getRawDataSizeFromColIndices method for getting the raw 
data size of projected columns (used by ANALYZE and StatsTask). I am going to 
put up another patch for uncompressed size in ORC split which will not use the 
getRawDataSizeOfColumns interface. The reason currently we are seeing this logs 
is because of this line in OrcInputFormat 
{code}
List<String> projCols = ColumnProjectionUtils.getReadColumnNames(context.conf);
{code}

This is actually a dead code which does not do any thing. So its safe to ignore 
these warnings for now.

> ReaderImpl: getColumnIndicesFromNames does not work for ACID tables
> -------------------------------------------------------------------
>
>                 Key: HIVE-11102
>                 URL: https://issues.apache.org/jira/browse/HIVE-11102
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats
>    Affects Versions: 1.3.0, 1.2.1, 2.0.0
>            Reporter: Gopal V
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-11102.patch
>
>
> ORC reader impl does not estimate the size of ACID data files correctly.
> {code}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0
>       at java.util.Collections$EmptyList.get(Collections.java:3212)
>       at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240)
>       at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getColumnIndicesFromNames(ReaderImpl.java:651)
>       at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getRawDataSizeOfColumns(ReaderImpl.java:634)
>       at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:938)
>       at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:847)
>       at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:713)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to