[
https://issues.apache.org/jira/browse/HIVE-19200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438108#comment-16438108
]
Jason Dere commented on HIVE-19200:
-----------------------------------
+1
> Vectorization: Disable vectorization for LLAP I/O when a
> non-VECTORIZED_INPUT_FILE_FORMAT mode is needed (i.e. rows) and data type
> conversion is needed
> -------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-19200
> URL: https://issues.apache.org/jira/browse/HIVE-19200
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 3.0.0
> Reporter: Matt McCline
> Assignee: Matt McCline
> Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-19200.01.patch
>
>
> Disable vectorization for issue in HIVE-18763 until we can do the harder VRB
> conversion code.
> The main changes are:
> 1) In the Vectorizer, detect if data type conversion is needed between the
> partition and the desired table schema. If so and LLAP I/O is enabled that
> does encoded catching, then do not vectorize. Why? When LLAP I/O is in
> encoded catching mode, it delivers VectorizedRowBatch (VRB) to the
> VectorMapOperator instead of (object) rows. We currently do not have logic
> for converting VRBs. So, we either get Wrong Results or more likely
> ClassCastException on the expected vs actual ColumnVector columns.
> 2) Cleaned up error message logic.that was suppressing the new message from
> the EXPLAIN VECTORIZATION display.
> 3) NOTE: Some of the SELECT statements in the schema_evol_test*.q are
> commented out because I bumped into a another bug. I'll file that one soon
> and add comments to the Q files.
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
> The longer-term solution can be done later in steps:
> 1) Write a new code that can take a VectorizedRowBatch (VRB) and convert
> columns to different data types. This is needed when LLAP is doing its
> encoding / caching and feeds VRBs to VectorMapOperator instead of rows.
> Similar to what MapOperator does today, VectorMapOperator would need to be
> enhanced to convert partition VRBs into the table schema VRBs that the vector
> operator tree expect.
> 2) Today, vectorization logic is strictly positional based. It insists that
> the partition columns have the same names as the table schema. The
> MapOperator (and ORC) does more general conversion that uses column names
> instead of column position. We'd need to enhance all 3 classes to handle
> column name based conversion. The 3 classes are: the new VRB-to-VRB
> conversion class, VectorDeserializeRow, and VectorAssignRow.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)