[
https://issues.apache.org/jira/browse/HIVE-11102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610808#comment-14610808
]
Prasanth Jayachandran commented on HIVE-11102:
----------------------------------------------
[~sershe] and [~gopalv].. getRawDataSizeOfColumns was never intended to be used
inside hive at the time of writing. Its added as a pure convenience method for
tools using ORC outside of hive like pig et. al. The reason being all other
tools will write the actual column names but hive writes internal names which
is weird. Hive uses getRawDataSizeFromColIndices method for getting the raw
data size of projected columns (used by ANALYZE and StatsTask). I am going to
put up another patch for uncompressed size in ORC split which will not use the
getRawDataSizeOfColumns interface. The reason currently we are seeing this logs
is because of this line in OrcInputFormat
{code}
List<String> projCols = ColumnProjectionUtils.getReadColumnNames(context.conf);
{code}
This is actually a dead code which does not do any thing. So its safe to ignore
these warnings for now.
> ReaderImpl: getColumnIndicesFromNames does not work for ACID tables
> -------------------------------------------------------------------
>
> Key: HIVE-11102
> URL: https://issues.apache.org/jira/browse/HIVE-11102
> Project: Hive
> Issue Type: Bug
> Components: File Formats
> Affects Versions: 1.3.0, 1.2.1, 2.0.0
> Reporter: Gopal V
> Assignee: Sergey Shelukhin
> Attachments: HIVE-11102.patch
>
>
> ORC reader impl does not estimate the size of ACID data files correctly.
> {code}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0
> at java.util.Collections$EmptyList.get(Collections.java:3212)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240)
> at
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getColumnIndicesFromNames(ReaderImpl.java:651)
> at
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getRawDataSizeOfColumns(ReaderImpl.java:634)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:938)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:847)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:713)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)