[ 
https://issues.apache.org/jira/browse/HIVE-26408?focusedWorklogId=795141&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795141
 ]

ASF GitHub Bot logged work on HIVE-26408:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 26/Jul/22 06:40
            Start Date: 26/Jul/22 06:40
    Worklog Time Spent: 10m 
      Work Description: abstractdog merged PR #3452:
URL: https://github.com/apache/hive/pull/3452




Issue Time Tracking
-------------------

    Worklog Id:     (was: 795141)
    Time Spent: 20m  (was: 10m)

> Vectorization: Fix deallocation of scratch columns, don't reuse a child 
> ConstantVectorExpression as an output
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-26408
>                 URL: https://issues.apache.org/jira/browse/HIVE-26408
>             Project: Hive
>          Issue Type: Bug
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is similar to HIVE-15588. With a customer query, I reproduced a 
> vectorized expression tree like the below one (I'll attach a simple repro 
> query when it's possible):
> {code}
> selectExpressions: IfExprCondExprColumn(col 67:boolean, col 63:string, col 
> 61:string)(children: StringColumnInList(col 13, values TermDeposit, 
> RecurringDeposit, CertificateOfDeposit) -> 67:boolean, VectorCoalesce(columns 
> [61, 62])(children: VectorUDFAdaptor(from_unixtime(to_unix_timestamp(CAST( 
> _col1 AS DATE)), 'MM-dd-yyyy'))(children: VectorUDFUnixTimeStampDate(col 
> 68)(children: CastStringToDate(col 33:string) -> 68:date) -> 69:bigint) -> 
> 61:string, ConstantVectorExpression(val  ) -> 62:string) -> 63:string, 
> ConstantVectorExpression(val ) -> 61:string) -> 62:string
> {code}
> query part was:
> {code}
>   CASE WHEN DLY_BAL.PDELP_VALUE in (
>     'TermDeposit', 'RecurringDeposit',
>     'CertificateOfDeposit'
>   ) THEN NVL(
>     (
>       from_unixtime(
>         unix_timestamp(
>           cast(DLY_BAL.APATD_MTRTY_DATE as date)
>         ),
>         'MM-dd-yyyy'
>       )
>     ),
>     ' '
>   ) ELSE '' END AS MAT_DTE
> {code}
> Here is the problem described:
> 1. IfExprCondExprColumn has 62:string as its 
> [outputColumn|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L64],
>  which is a reused scratch column (see 5) )
> 2. in evaluation time, [isRepeating is 
> reset|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L68]
> 3. in order to evaluate IfExprCondExprColumn, the conditional evaluation of 
> children is required, so we go to 
> [conditionalEvaluate|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L95]
> 4. one of the children is ConstantVectorExpression(val  ) -> 62:string, which 
> belongs to the second branch of VectorCoalesce, so to the '' empty string in 
> NVL's second argument
> 5. in 4) 62: string column is set to an isRepeating column (and it's released 
> by 
> [freeNonColumns|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2459]),
>  so it's marked as a reusable scratch column
> 6. after the conditional evaluation in 3), the final output of 
> IfExprCondExprColumn set 
> [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L99],
>  but here we get an exception 
> [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java#L484]:
> {code}
> 2022-07-01T04:26:24,567 ERROR [TezTR-745267_1_35_6_0_0] tez.MapRecordSource: 
> java.lang.AssertionError: Output column number expected to be 0 when 
> isRepeating
>       at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:494)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprCondExprColumn.evaluate(IfExprCondExprColumn.java:108)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>       at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:694)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyStringOperator.processBatch(VectorMapJoinInnerBigOnlyStringOperator.java:371)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.process(VectorMapJoinCommonOperator.java:839)
>       at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
> {code}
> this is clearly an incorrect scratch column reuse, where we reused the output 
> of some children, and got that vector in an inconsistent state
> this must not be fixed by resetting vectors in more places in 
> IfExprCondExprColumn, as it would just hide the original issue
> I realized that the problem can be easily fixed by simply preventing 
> releasing ConstantVectorExpressions, that's what I'm trying to test now



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to