[
https://issues.apache.org/jira/browse/HIVE-17915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on HIVE-17915 started by Teddy Choi.
-----------------------------------------
> Enable VectorizedOrcAcidRowBatchReader to be used with LLAP IO elevator over
> original acid files
> ------------------------------------------------------------------------------------------------
>
> Key: HIVE-17915
> URL: https://issues.apache.org/jira/browse/HIVE-17915
> Project: Hive
> Issue Type: Sub-task
> Components: Transactions
> Affects Versions: 3.0.0
> Reporter: Eugene Koifman
> Assignee: Teddy Choi
> Priority: Critical
>
> Since HIVE-12631, LLAP IO can support Acid tables but when reading "original"
> files.
> HIVE-17458 enables VectorizedOrcAcidRowBatchReader to vectorize reads over
> "original" files but not with LLAP IO.
> Current implementation of _OrcSplit.canUseLlapIo()_ is the same as in
> HIVE-12631.
> This can/should be improved. There are 2 parts to this:
> When a read of "original" file is performed such that data doesn't need to be
> decorated with ROW__ID (see
> __VectorizedOrcAcidRowBatchReader.canUseLlapForAcid()_) then
> VectorizedOrcAcidRowBatchReader as of HIVE-17458 should be usable with LLAP
> IO but when I tried it I got _ArrayIndexOutOfBoundsException_ in various
> places of the stack.
> This is the more important one.
> The 2nd issue is that reading "original" acid files (when ROW__IDs are
> needed) requires using
> _org.apache.hadoop.hive.ql.io.orc.RecordReader.getRowNumber()_ in
> __VectorizedOrcAcidRowBatchReader_
> This API is not available on the reader that _LlapRecordReader_ provides.
> It would be better if getRowNumber() was available for performance as well as
> simpler logic in the code.
> cc [~sershe], [~teddy.choi]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)