[
https://issues.apache.org/jira/browse/HIVE-26408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
László Bodor updated HIVE-26408:
--------------------------------
Description:
This is similar to HIVE-15588. With a customer query, I reproduced a vectorized
expression tree like the below one (I'll attach a simple repro query when it's
possible):
{code}
selectExpressions: IfExprCondExprColumn(col 67:boolean, col 63:string, col
61:string)(children: StringColumnInList(col 13, values TermDeposit,
RecurringDeposit, CertificateOfDeposit) -> 67:boolean, VectorCoalesce(columns
[61, 62])(children: VectorUDFAdaptor(from_unixtime(to_unix_timestamp(CAST(
_col1 AS DATE)), 'MM-dd-yyyy'))(children: VectorUDFUnixTimeStampDate(col
68)(children: CastStringToDate(col 33:string) -> 68:date) -> 69:bigint) ->
61:string, ConstantVectorExpression(val ) -> 62:string) -> 63:string,
ConstantVectorExpression(val ) -> 61:string) -> 62:string
{code}
query part was:
{code}
CASE WHEN DLY_BAL.PDELP_VALUE in (
'TermDeposit', 'RecurringDeposit',
'CertificateOfDeposit'
) THEN NVL(
(
from_unixtime(
unix_timestamp(
cast(DLY_BAL.APATD_MTRTY_DATE as date)
),
'MM-dd-yyyy'
)
),
' '
) ELSE '' END AS MAT_DTE
{code}
Here is the problem described:
1. IfExprCondExprColumn has 62:string as its
[outputColumn|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L64],
which is a reused scratch column (see 5) )
2. in evaluation time, [isRepeating is
reset|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L68]
3. in order to evaluate IfExprCondExprColumn, the conditional evaluation of
children is required, so we go to
[conditionalEvaluate|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L95]
4. one of the children is ConstantVectorExpression(val ) -> 62:string, which
belongs to the second branch of VectorCoalesce, so to the '' empty string in
NVL's second argument
5. in 4) 62: string column is set to an isRepeating column (and it's released
by
[freeNonColumns|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2459]),
so it's marked as a reusable scratch column
6. after the conditional evaluation in 3), the final output of
IfExprCondExprColumn set
[here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L99],
but here we get an exception
[here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java#L484]:
{code}
2022-07-01T04:26:24,567 ERROR [TezTR-745267_1_35_6_0_0] tez.MapRecordSource:
java.lang.AssertionError: Output column number expected to be 0 when isRepeating
at
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:494)
at
org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprCondExprColumn.evaluate(IfExprCondExprColumn.java:108)
at
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
at
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
at
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:694)
at
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyStringOperator.processBatch(VectorMapJoinInnerBigOnlyStringOperator.java:371)
at
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.process(VectorMapJoinCommonOperator.java:839)
at
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
{code}
this is clearly an incorrect scratch column reuse, which must not be fixed by
resetting vectors in IfExprCondExprColumn, as it would just hide the original
issue
I realized that the problem can be easily fixed by simply prevent releasing
ConstantVectorExpressions, that's what I'm trying to test now
was:
This is similar to HIVE-15588. With a customer query, I reproduced a vectorized
expression tree like the below one (I'll attach a simple repro query when it's
possible):
{code}
selectExpressions: IfExprCondExprColumn(col 67:boolean, col 63:string, col
61:string)(children: StringColumnInList(col 13, values TermDeposit,
RecurringDeposit, CertificateOfDeposit) -> 67:boolean, VectorCoalesce(columns
[61, 62])(children: VectorUDFAdaptor(from_unixtime(to_unix_timestamp(CAST(
_col1 AS DATE)), 'MM-dd-yyyy'))(children: VectorUDFUnixTimeStampDate(col
68)(children: CastStringToDate(col 33:string) -> 68:date) -> 69:bigint) ->
61:string, ConstantVectorExpression(val ) -> 62:string) -> 63:string,
ConstantVectorExpression(val ) -> 61:string) -> 62:string
{code}
query part was:
{code}
CASE WHEN DLY_BAL.PDELP_VALUE in (
'TermDeposit', 'RecurringDeposit',
'CertificateOfDeposit'
) THEN NVL(
(
from_unixtime(
unix_timestamp(
cast(DLY_BAL.APATD_MTRTY_DATE as date)
),
'MM-dd-yyyy'
)
),
' '
) ELSE '' END AS MAT_DTE
{code}
Here is the problem described:
1. IfExprCondExprColumn has 62:string as its
[outputColumn|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L64],
which is a reused scratch column (see 5) )
2. in evaluation time, [isRepeating is
reset|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L68]
3. in order to evaluate IfExprCondExprColumn, the conditional evaluation of
children is required, so
[conditionalEvaluate|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L95]
4. one of the children is ConstantVectorExpression(val ) -> 62:string, which
belongs to the second branch of VectorCoalesce, so to the '' empty string in
NVL's second argument
5. in 4) 62: string column is set to an isRepeating column (and it's released
by
[freeNonColumns|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2459]),
so it's marked as a reusable scratch column
6. after the conditional evaluation in 3), the final output of
IfExprCondExprColumn set
[here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L99],
but here we get an exception
[here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java#L484]:
{code}
2022-07-01T04:26:24,567 ERROR [TezTR-745267_1_35_6_0_0] tez.MapRecordSource:
java.lang.AssertionError: Output column number expected to be 0 when isRepeating
at
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:494)
at
org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprCondExprColumn.evaluate(IfExprCondExprColumn.java:108)
at
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
at
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
at
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:694)
at
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyStringOperator.processBatch(VectorMapJoinInnerBigOnlyStringOperator.java:371)
at
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.process(VectorMapJoinCommonOperator.java:839)
at
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
{code}
this is clearly an incorrect scratch column reuse, which must not be fixed by
resetting vectors in IfExprCondExprColumn, as it would just hide the original
issue
I realized that the problem can be easily fixed by simply prevent releasing
ConstantVectorExpressions, that's what I'm trying to test now
> Vectorization: Fix deallocation of scratch columns, don't reuse a child
> ConstantVectorExpression as an output
> -------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-26408
> URL: https://issues.apache.org/jira/browse/HIVE-26408
> Project: Hive
> Issue Type: Bug
> Reporter: László Bodor
> Assignee: László Bodor
> Priority: Major
>
> This is similar to HIVE-15588. With a customer query, I reproduced a
> vectorized expression tree like the below one (I'll attach a simple repro
> query when it's possible):
> {code}
> selectExpressions: IfExprCondExprColumn(col 67:boolean, col 63:string, col
> 61:string)(children: StringColumnInList(col 13, values TermDeposit,
> RecurringDeposit, CertificateOfDeposit) -> 67:boolean, VectorCoalesce(columns
> [61, 62])(children: VectorUDFAdaptor(from_unixtime(to_unix_timestamp(CAST(
> _col1 AS DATE)), 'MM-dd-yyyy'))(children: VectorUDFUnixTimeStampDate(col
> 68)(children: CastStringToDate(col 33:string) -> 68:date) -> 69:bigint) ->
> 61:string, ConstantVectorExpression(val ) -> 62:string) -> 63:string,
> ConstantVectorExpression(val ) -> 61:string) -> 62:string
> {code}
> query part was:
> {code}
> CASE WHEN DLY_BAL.PDELP_VALUE in (
> 'TermDeposit', 'RecurringDeposit',
> 'CertificateOfDeposit'
> ) THEN NVL(
> (
> from_unixtime(
> unix_timestamp(
> cast(DLY_BAL.APATD_MTRTY_DATE as date)
> ),
> 'MM-dd-yyyy'
> )
> ),
> ' '
> ) ELSE '' END AS MAT_DTE
> {code}
> Here is the problem described:
> 1. IfExprCondExprColumn has 62:string as its
> [outputColumn|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L64],
> which is a reused scratch column (see 5) )
> 2. in evaluation time, [isRepeating is
> reset|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L68]
> 3. in order to evaluate IfExprCondExprColumn, the conditional evaluation of
> children is required, so we go to
> [conditionalEvaluate|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L95]
> 4. one of the children is ConstantVectorExpression(val ) -> 62:string, which
> belongs to the second branch of VectorCoalesce, so to the '' empty string in
> NVL's second argument
> 5. in 4) 62: string column is set to an isRepeating column (and it's released
> by
> [freeNonColumns|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2459]),
> so it's marked as a reusable scratch column
> 6. after the conditional evaluation in 3), the final output of
> IfExprCondExprColumn set
> [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L99],
> but here we get an exception
> [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java#L484]:
> {code}
> 2022-07-01T04:26:24,567 ERROR [TezTR-745267_1_35_6_0_0] tez.MapRecordSource:
> java.lang.AssertionError: Output column number expected to be 0 when
> isRepeating
> at
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:494)
> at
> org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprCondExprColumn.evaluate(IfExprCondExprColumn.java:108)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
> at
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:694)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyStringOperator.processBatch(VectorMapJoinInnerBigOnlyStringOperator.java:371)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.process(VectorMapJoinCommonOperator.java:839)
> at
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
> {code}
> this is clearly an incorrect scratch column reuse, which must not be fixed by
> resetting vectors in IfExprCondExprColumn, as it would just hide the original
> issue
> I realized that the problem can be easily fixed by simply prevent releasing
> ConstantVectorExpressions, that's what I'm trying to test now
--
This message was sent by Atlassian Jira
(v8.20.10#820010)