[ 
https://issues.apache.org/jira/browse/HIVE-19200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-19200:
--------------------------------
    Status: In Progress  (was: Patch Available)

> Vectorization: Disable vectorization for LLAP I/O when a 
> non-VECTORIZED_INPUT_FILE_FORMAT mode is needed (i.e. rows) and data type 
> conversion is needed
> -------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-19200
>                 URL: https://issues.apache.org/jira/browse/HIVE-19200
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 3.0.0
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>             Fix For: 3.0.0
>
>         Attachments: HIVE-19200.01.patch
>
>
> Disable vectorization for issue in HIVE-18763 until we can do the harder VRB 
> conversion code.
> The main changes are:
> 1) In the Vectorizer, detect if data type conversion is needed between the 
> partition and the desired table schema.  If so and LLAP I/O is enabled that 
> does encoded catching, then do not vectorize.  Why? When LLAP I/O is in 
> encoded catching mode, it delivers VectorizedRowBatch (VRB) to the 
> VectorMapOperator instead of (object) rows.  We currently do not have logic 
> for converting VRBs.  So, we either get Wrong Results or more likely 
> ClassCastException on the expected vs actual ColumnVector columns.
> 2) Cleaned up error message logic.that was suppressing the new message from 
> the EXPLAIN VECTORIZATION display.
> 3) NOTE: Some of the SELECT statements in the schema_evol_test*.q are 
> commented out because I bumped into a another bug.  I'll file that one soon 
> and add comments to the Q files. 
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
> The longer-term solution can be done later in steps:
> 1) Write a new code that can take a VectorizedRowBatch (VRB) and convert 
> columns to different data types.  This is needed when LLAP is doing its 
> encoding / caching and feeds VRBs to VectorMapOperator instead of rows.  
> Similar to what MapOperator does today, VectorMapOperator would need to be 
> enhanced to convert partition VRBs into the table schema VRBs that the vector 
> operator tree expect.
> 2) Today, vectorization logic is strictly positional based.  It insists that 
> the partition columns have the same names as the table schema.  The 
> MapOperator (and ORC) does more general conversion that uses column names 
> instead of column position.  We'd need to enhance all 3 classes to handle 
> column name based conversion.  The 3 classes are: the new VRB-to-VRB 
> conversion class, VectorDeserializeRow, and VectorAssignRow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to