[
https://issues.apache.org/jira/browse/HIVE-26408?focusedWorklogId=795141&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795141
]
ASF GitHub Bot logged work on HIVE-26408:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 26/Jul/22 06:40
Start Date: 26/Jul/22 06:40
Worklog Time Spent: 10m
Work Description: abstractdog merged PR #3452:
URL: https://github.com/apache/hive/pull/3452
Issue Time Tracking
-------------------
Worklog Id: (was: 795141)
Time Spent: 20m (was: 10m)
> Vectorization: Fix deallocation of scratch columns, don't reuse a child
> ConstantVectorExpression as an output
> -------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-26408
> URL: https://issues.apache.org/jira/browse/HIVE-26408
> Project: Hive
> Issue Type: Bug
> Reporter: László Bodor
> Assignee: László Bodor
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> This is similar to HIVE-15588. With a customer query, I reproduced a
> vectorized expression tree like the below one (I'll attach a simple repro
> query when it's possible):
> {code}
> selectExpressions: IfExprCondExprColumn(col 67:boolean, col 63:string, col
> 61:string)(children: StringColumnInList(col 13, values TermDeposit,
> RecurringDeposit, CertificateOfDeposit) -> 67:boolean, VectorCoalesce(columns
> [61, 62])(children: VectorUDFAdaptor(from_unixtime(to_unix_timestamp(CAST(
> _col1 AS DATE)), 'MM-dd-yyyy'))(children: VectorUDFUnixTimeStampDate(col
> 68)(children: CastStringToDate(col 33:string) -> 68:date) -> 69:bigint) ->
> 61:string, ConstantVectorExpression(val ) -> 62:string) -> 63:string,
> ConstantVectorExpression(val ) -> 61:string) -> 62:string
> {code}
> query part was:
> {code}
> CASE WHEN DLY_BAL.PDELP_VALUE in (
> 'TermDeposit', 'RecurringDeposit',
> 'CertificateOfDeposit'
> ) THEN NVL(
> (
> from_unixtime(
> unix_timestamp(
> cast(DLY_BAL.APATD_MTRTY_DATE as date)
> ),
> 'MM-dd-yyyy'
> )
> ),
> ' '
> ) ELSE '' END AS MAT_DTE
> {code}
> Here is the problem described:
> 1. IfExprCondExprColumn has 62:string as its
> [outputColumn|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L64],
> which is a reused scratch column (see 5) )
> 2. in evaluation time, [isRepeating is
> reset|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L68]
> 3. in order to evaluate IfExprCondExprColumn, the conditional evaluation of
> children is required, so we go to
> [conditionalEvaluate|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L95]
> 4. one of the children is ConstantVectorExpression(val ) -> 62:string, which
> belongs to the second branch of VectorCoalesce, so to the '' empty string in
> NVL's second argument
> 5. in 4) 62: string column is set to an isRepeating column (and it's released
> by
> [freeNonColumns|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2459]),
> so it's marked as a reusable scratch column
> 6. after the conditional evaluation in 3), the final output of
> IfExprCondExprColumn set
> [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L99],
> but here we get an exception
> [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java#L484]:
> {code}
> 2022-07-01T04:26:24,567 ERROR [TezTR-745267_1_35_6_0_0] tez.MapRecordSource:
> java.lang.AssertionError: Output column number expected to be 0 when
> isRepeating
> at
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:494)
> at
> org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprCondExprColumn.evaluate(IfExprCondExprColumn.java:108)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
> at
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:694)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyStringOperator.processBatch(VectorMapJoinInnerBigOnlyStringOperator.java:371)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.process(VectorMapJoinCommonOperator.java:839)
> at
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
> {code}
> this is clearly an incorrect scratch column reuse, where we reused the output
> of some children, and got that vector in an inconsistent state
> this must not be fixed by resetting vectors in more places in
> IfExprCondExprColumn, as it would just hide the original issue
> I realized that the problem can be easily fixed by simply preventing
> releasing ConstantVectorExpressions, that's what I'm trying to test now
--
This message was sent by Atlassian Jira
(v8.20.10#820010)