[
https://issues.apache.org/jira/browse/HIVE-25734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alessandro Solimando reassigned HIVE-25734:
-------------------------------------------
Assignee: Alessandro Solimando (was: Ayush Saxena)
> Wrongly-typed constant in case expression leads to incorrect empty result
> -------------------------------------------------------------------------
>
> Key: HIVE-25734
> URL: https://issues.apache.org/jira/browse/HIVE-25734
> Project: Hive
> Issue Type: Bug
> Components: CBO
> Affects Versions: 4.0.0
> Reporter: Alessandro Solimando
> Assignee: Alessandro Solimando
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
>
> The type of constants in case expressions should be inferred, if possible, by
> the "surrounding" input reference columns, if any.
> Consider the following table and query:
> {code:java}
> create external table test_case (row_seq smallint, row_desc string) stored as
> parquet;
> insert into test_case values (1, 'a');
> insert into test_case values (2, 'aa');
> insert into test_case values (6, 'aaaaaa');
> with base_t as (select row_seq, row_desc,
> case row_seq
> when 1 then '34'
> when 6 then '35'
> when 2 then '36'
> end as zb from test_case where row_seq in (1,2,6))
> select row_seq, row_desc, zb from base_t where zb <> '34';{code}
> The aforementioned query fails by returning an empty results, while "1 a 34"
> is expected.
>
> To understand the root cause, let's consider the debug input and output of
> some related CBO rules which are triggered during the evaluation of the
> query:
>
> {noformat}
> --$0 is the column 'row_seq'
> 1. HiveReduceExpressionsRule
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), <>(CASE(=($0,
> 1:INTEGER), '34':VARCHAR, =($0, 6:INTEGER), '35':VARCHAR, =($0, 2:INTEGER),
> '36':VARCHAR, null:VARCHAR), '34':CHAR(2)))
> Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER),
> =($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
> 2. HivePointLookupOptimizerRule.RexTransformIntoInClause
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER),
> =($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
> Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER,
> 2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
> 3. HivePointLookupOptimizerRule.RexMergeInClause
> Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER,
> 2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
> Output: false{noformat}
> In the first part, we can see that the constants are correctly typed as
> "SMALLINT" in the first part of the "AND" operand, while they are typed as
> "INTEGER" for the "CASE" expression, despite the input reference "$0" being
> available for inferring a more precise type.
> This type difference makes "HivePointLookupOptimizerRule.RexMergeInClause"
> missing the commonality between the two "IN" expressions, whose intersection
> is considered empty, hence the empty result.
> Providing a more refined type inference for "case" expressions should fix the
> issue.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)