Alessandro Solimando created HIVE-25734:
-------------------------------------------
Summary: Wrongly-typed constant in case expression leads to
incorrect empty result
Key: HIVE-25734
URL: https://issues.apache.org/jira/browse/HIVE-25734
Project: Hive
Issue Type: Bug
Components: CBO
Affects Versions: 4.0.0
Reporter: Alessandro Solimando
The type of constants in case expressions should be inferred, if possible, by
the "surrounding" input reference columns, if any.
Consider the following table and query:
{code:java}
create external table test_case (row_seq smallint, row_desc string) stored as
parquet;
insert into test_case values (1, 'a');
insert into test_case values (2, 'aa');
insert into test_case values (6, 'aaaaaa');
with base_t as (select row_seq, row_desc,
case row_seq
when 1 then '34'
when 6 then '35'
when 2 then '36'
end as zb from test_case where row_seq in (1,2,6))
select row_seq, row_desc, zb from base_t where zb <> '34';{code}
The aforementioned query fails by returning an empty results, while "1 a 34" is
expected.
To understand the root cause, let's consider the debug input and output of some
related CBO rules which are triggered during the evaluation of the query:
{noformat}
--$0 is the column 'row_seq'
1. HiveReduceExpressionsRule
Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), <>(CASE(=($0,
1:INTEGER), '34':VARCHAR, =($0, 6:INTEGER), '35':VARCHAR, =($0, 2:INTEGER),
'36':VARCHAR, null:VARCHAR), '34':CHAR(2)))
Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER),
=($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
2. HivePointLookupOptimizerRule.RexTransformIntoInClause
Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER),
=($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER,
2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
3. HivePointLookupOptimizerRule.RexMergeInClause
Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER,
2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
Output: false{noformat}
In the first part, we can see that the constants are correctly typed as
"SMALLINT" in the first part of the "AND" operand, while they are typed as
"INTEGER" for the "CASE" expression, despite the input reference "$0" being
available for inferring a more precise type.
This type difference makes "HivePointLookupOptimizerRule.RexMergeInClause"
missing the commonality between the two "IN" expressions, whose intersection is
considered empty, hence the empty result.
Providing a more refined type inference for "case" expressions should fix the
issue.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)