[
https://issues.apache.org/jira/browse/HIVE-26733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alessandro Solimando updated HIVE-26733:
----------------------------------------
Description:
HiveRelMdPredicates was forked from Calcite's RelMdPredicates long time ago.
Hive's version lacks this commit
[https://github.com/apache/calcite/commit/8281668f] which introduced the use of
"IS NOT DISTINCT FROM" in place of "EQUAL" when a constant expression can be
NULL.
There is no Calcite ticket for this change, so I am briefly explaining the
issue here.
Consider the following input as argument of
HiveRelMdPredicates::pullUpPredicates(Project) method:
{code:java}
SELECT char_length(NULL) FROM t{code}
The method currently infers the predicate (=($0, CHAR_LENGTH(null:NULL))) which
translates to "=(NULL, NULL)", which turns simplifies to FALSE under the
unknownAsFalse semantics.
The change will make this methods return "IS NOT DISTINCT FROM($0,
CHAR_LENGTH(null:NULL))", which translates to IS NOT DISTINCT FROM(NULL, NULL),
which is TRUE.
For reference, we have the truth table below (from [1]):
||{{A}}||{{B}}||{{A = B}}||{{A IS NOT DISTINCT FROM B}}||
|{{0}}|{{0}}|_true_|_true_|
|{{0}}|{{1}}|_false_|_false_|
|{{0}}|{{null}}|_*unknown*_|_*false*_|
|{{null}}|{{null}}|_*unknown*_|_*true*_|
[1] https://modern-sql.com/feature/is-distinct-from
was:
Given a _CAST(NULL as $type)_ as i-th project expression, the method returns
_(=($i, CAST(null:NULL):$type)_ instead of _IS_NULL($i)_ as in the case of a
_NULL_ literal project expression.
This is because _RexLiteral::isNullLiteral_ is used
[here|https://github.com/apache/hive/blob/a6c0229f910972e84ba558e728532ffc245cc10d/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153],
while in similar cases, it's often convenient to use
{_}RexUtil::isNullLiteral(RexNode, boolean allowCast){_}.
> Not safe to use '=' for predicates on constant expressions that might be NULL
> -----------------------------------------------------------------------------
>
> Key: HIVE-26733
> URL: https://issues.apache.org/jira/browse/HIVE-26733
> Project: Hive
> Issue Type: Bug
> Components: CBO
> Affects Versions: 4.0.0-alpha-1
> Reporter: Alessandro Solimando
> Assignee: Alessandro Solimando
> Priority: Major
> Labels: pull-request-available
> Time Spent: 50m
> Remaining Estimate: 0h
>
> HiveRelMdPredicates was forked from Calcite's RelMdPredicates long time ago.
> Hive's version lacks this commit
> [https://github.com/apache/calcite/commit/8281668f] which introduced the use
> of "IS NOT DISTINCT FROM" in place of "EQUAL" when a constant expression can
> be NULL.
> There is no Calcite ticket for this change, so I am briefly explaining the
> issue here.
> Consider the following input as argument of
> HiveRelMdPredicates::pullUpPredicates(Project) method:
> {code:java}
> SELECT char_length(NULL) FROM t{code}
> The method currently infers the predicate (=($0, CHAR_LENGTH(null:NULL)))
> which translates to "=(NULL, NULL)", which turns simplifies to FALSE under
> the unknownAsFalse semantics.
> The change will make this methods return "IS NOT DISTINCT FROM($0,
> CHAR_LENGTH(null:NULL))", which translates to IS NOT DISTINCT FROM(NULL,
> NULL), which is TRUE.
> For reference, we have the truth table below (from [1]):
> ||{{A}}||{{B}}||{{A = B}}||{{A IS NOT DISTINCT FROM B}}||
> |{{0}}|{{0}}|_true_|_true_|
> |{{0}}|{{1}}|_false_|_false_|
> |{{0}}|{{null}}|_*unknown*_|_*false*_|
> |{{null}}|{{null}}|_*unknown*_|_*true*_|
> [1] https://modern-sql.com/feature/is-distinct-from
--
This message was sent by Atlassian Jira
(v8.20.10#820010)