[ 
https://issues.apache.org/jira/browse/HIVE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26722:
----------------------------------------
    Description: 
h1. Reproducer

Consider the following query:
{code:java}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);

CREATE EXTERNAL TABLE t (a string, b string);
INSERT INTO t VALUES ('1000', 'b1');
INSERT INTO t VALUES ('2000', 'b2');

SELECT * FROM (
  SELECT
   a,
   b
  FROM t
   UNION ALL
  SELECT
   a,
   CAST(NULL AS string)
   FROM t) AS t2
WHERE a = 1000;EXPLAIN CBO
SELECT * FROM (
  SELECT
   a,
   b
  FROM t
   UNION ALL
  SELECT
   a,
   CAST(NULL AS string)
   FROM t) AS t2
WHERE a = 1000; {code}
 

The expected result is:
{code:java}
1000    b1
1000    NULL{code}
An example of correct plan is as follows:
{noformat}
CBO PLAN:
HiveUnion(all=[true])
  HiveProject(a=[$0], b=[$1])
    HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
      HiveTableScan(table=[[default, t]], table:alias=[t])
  HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE"])
    HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
      HiveTableScan(table=[[default, t]], table:alias=[t]){noformat}
 

Consider now a scenario where expression reduction in projections is disabled 
by setting the following property{_}:{_}
{noformat}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
{noformat}
In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, 
and we get the following (invalid) result:
{code:java}
1000    b1{code}
produced by the following invalid plan:
{code:java}
CBO PLAN:
HiveProject(a=[$0], b=[$1])
  HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
    HiveTableScan(table=[[default, t]], table:alias=[t]) {code}
h1. Problem Analysis

At 
[HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112]
 the _RelMetadataQuery::getPulledUpPredicates_ method infers the following 
predicate due to the CAST(NULL) in the projection:
{code:java}
(=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code}
When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is 
inferred.

In 
[HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122],
 the rule checks if the conjunction of the predicate coming from the filter 
(here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or 
not, under the _UnknownAsFalse_ semantics.

To summarize, the following expression is simplified under the _UnknownAsFalse_ 
semantics:
{code:java}
AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), 
=(CAST($0):DOUBLE, 1000))
{code}
Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to {_}FALSE{_}, 
because no value is equal to NULL (even NULL itself), AND(FALSE,  
=(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and the 
UNION ALL operand is pruned.

Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the issue, 
due to the _IS_NULL($1)_ inferred predicate, see 
[HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156]
 for understanding how the NULL literal is treated differently during predicate 
inference.

The problem lies in the fact that, depending on the input _RelNode_ that we 
infer predicates from, the semantics is not necessarily {_}UnknownAsFalse{_}, 
but it might be {_}UnknownAsUnknown{_}, like for {_}Project{_}, as in this case.
h1. Solution

In order to correctly simplify a predicate and test if it's always false or 
not, we should build RexSimplify with _predicates_ as the list of predicates 
known to hold in the context. In this way, the different semantics are 
correctly taken into account.

The code at 
[HiveFilterSetOpTransposeRule.java#L114-L121|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L121]
 should be replaced by the following:
{code:java}
final RexExecutor executor =
Util.first(filterRel.getCluster().getPlanner().getExecutor(), RexUtil.EXECUTOR);
final RexSimplify simplify = new RexSimplify(rexBuilder, predicates, executor);
final RexNode x = simplify.simplifyUnknownAs(newCondition, 
RexUnknownAs.FALSE);{code}
This is in line with other uses of simplification, like in Calcite:

[https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L704-L705]

You can see the above method used for different kind of rels, always in the way 
identical to what this ticket advocates:filters:
[https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L152-L155]

projects:
[https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L306-L310]

joins:
[https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L380-L383]

windows:
[https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L567-L576]

  was:
h1. Reproducer

Consider the following query:
{code:java}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);

CREATE EXTERNAL TABLE t (a string, b string);
INSERT INTO t VALUES ('1000', 'b1');
INSERT INTO t VALUES ('2000', 'b2');

SELECT * FROM (
  SELECT
   a,
   b
  FROM t
   UNION ALL
  SELECT
   a,
   CAST(NULL AS string)
   FROM t) AS t2
WHERE a = 1000;EXPLAIN CBO
SELECT * FROM (
  SELECT
   a,
   b
  FROM t
   UNION ALL
  SELECT
   a,
   CAST(NULL AS string)
   FROM t) AS t2
WHERE a = 1000; {code}
 

The expected result is:
{code:java}
1000    b1
1000    NULL{code}
An example of correct plan is as follows:
{noformat}
CBO PLAN:
HiveUnion(all=[true])
  HiveProject(a=[$0], b=[$1])
    HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
      HiveTableScan(table=[[default, t]], table:alias=[t])
  HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE"])
    HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
      HiveTableScan(table=[[default, t]], table:alias=[t]){noformat}
 

Consider now a scenario where expression reduction in projections is disabled 
by setting the following property{_}:{_}
{noformat}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
{noformat}
In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, 
and we get the following (invalid) result:
{code:java}
1000    b1{code}
produced by the following invalid plan:
{code:java}
CBO PLAN:
HiveProject(a=[$0], b=[$1])
  HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
    HiveTableScan(table=[[default, t]], table:alias=[t]) {code}
h1. Problem Analysis

At 
[HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112]
 the _RelMetadataQuery::getPulledUpPredicates_ method infers the following 
predicate due to the CAST(NULL) in the projection:
{code:java}
(=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code}
When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is 
inferred.

In 
[HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122],
 the rule checks if the conjunction of the predicate coming from the filter 
(here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or 
not, under the _UnknownAsFalse_ semantics.

To summarize, the following expression is simplified under the _UnknownAsFalse_ 
semantics:
{code:java}
AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), 
=(CAST($0):DOUBLE, 1000))
{code}
Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to {_}FALSE{_}, 
because no value is equal to NULL (even NULL itself), AND(FALSE,  
=(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and the 
UNION ALL operand is pruned.

Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the issue, 
due to the _IS_NULL($1)_ inferred predicate, see 
[HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156]
 for understanding how the NULL literal is treated differently during predicate 
inference.

The problem lies in the fact that, depending on the input _RelNode_ that we 
infer predicates from, the semantics is not necessarily {_}UnknownAsFalse{_}, 
but it might be {_}UnknownAsUnknown{_}, like for {_}Project{_}, as in this case.
h1. Solution

In order to correctly simplify a predicate and test if it's always false or 
not, we should build RexSimplify with _predicates_ as the list of predicates 
known to hold in the context. In this way, the different semantics are 
correctly taken into account.

The code at 
[HiveFilterSetOpTransposeRule.java#L114-L121|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L121]
 should be replaced by the following:
{code:java}
final RexExecutor executor =
Util.first(filterRel.getCluster().getPlanner().getExecutor(), RexUtil.EXECUTOR);
final RexSimplify simplify = new RexSimplify(rexBuilder, predicates, executor);
final RexNode x = simplify.simplifyUnknownAs(newCondition, 
RexUnknownAs.FALSE);{code}


> HiveFilterSetOpTransposeRule incorrectly prunes UNION ALL operands
> ------------------------------------------------------------------
>
>                 Key: HIVE-26722
>                 URL: https://issues.apache.org/jira/browse/HIVE-26722
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>    Affects Versions: 4.0.0-alpha-1
>            Reporter: Alessandro Solimando
>            Assignee: Alessandro Solimando
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> h1. Reproducer
> Consider the following query:
> {code:java}
> set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
> CREATE EXTERNAL TABLE t (a string, b string);
> INSERT INTO t VALUES ('1000', 'b1');
> INSERT INTO t VALUES ('2000', 'b2');
> SELECT * FROM (
>   SELECT
>    a,
>    b
>   FROM t
>    UNION ALL
>   SELECT
>    a,
>    CAST(NULL AS string)
>    FROM t) AS t2
> WHERE a = 1000;EXPLAIN CBO
> SELECT * FROM (
>   SELECT
>    a,
>    b
>   FROM t
>    UNION ALL
>   SELECT
>    a,
>    CAST(NULL AS string)
>    FROM t) AS t2
> WHERE a = 1000; {code}
>  
> The expected result is:
> {code:java}
> 1000    b1
> 1000    NULL{code}
> An example of correct plan is as follows:
> {noformat}
> CBO PLAN:
> HiveUnion(all=[true])
>   HiveProject(a=[$0], b=[$1])
>     HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
>       HiveTableScan(table=[[default, t]], table:alias=[t])
>   HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE"])
>     HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
>       HiveTableScan(table=[[default, t]], table:alias=[t]){noformat}
>  
> Consider now a scenario where expression reduction in projections is disabled 
> by setting the following property{_}:{_}
> {noformat}
> set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
> {noformat}
> In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, 
> and we get the following (invalid) result:
> {code:java}
> 1000    b1{code}
> produced by the following invalid plan:
> {code:java}
> CBO PLAN:
> HiveProject(a=[$0], b=[$1])
>   HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
>     HiveTableScan(table=[[default, t]], table:alias=[t]) {code}
> h1. Problem Analysis
> At 
> [HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112]
>  the _RelMetadataQuery::getPulledUpPredicates_ method infers the following 
> predicate due to the CAST(NULL) in the projection:
> {code:java}
> (=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code}
> When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is 
> inferred.
> In 
> [HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122],
>  the rule checks if the conjunction of the predicate coming from the filter 
> (here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or 
> not, under the _UnknownAsFalse_ semantics.
> To summarize, the following expression is simplified under the 
> _UnknownAsFalse_ semantics:
> {code:java}
> AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), 
> =(CAST($0):DOUBLE, 1000))
> {code}
> Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to 
> {_}FALSE{_}, because no value is equal to NULL (even NULL itself), AND(FALSE, 
>  =(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and 
> the UNION ALL operand is pruned.
> Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the 
> issue, due to the _IS_NULL($1)_ inferred predicate, see 
> [HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156]
>  for understanding how the NULL literal is treated differently during 
> predicate inference.
> The problem lies in the fact that, depending on the input _RelNode_ that we 
> infer predicates from, the semantics is not necessarily {_}UnknownAsFalse{_}, 
> but it might be {_}UnknownAsUnknown{_}, like for {_}Project{_}, as in this 
> case.
> h1. Solution
> In order to correctly simplify a predicate and test if it's always false or 
> not, we should build RexSimplify with _predicates_ as the list of predicates 
> known to hold in the context. In this way, the different semantics are 
> correctly taken into account.
> The code at 
> [HiveFilterSetOpTransposeRule.java#L114-L121|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L121]
>  should be replaced by the following:
> {code:java}
> final RexExecutor executor =
> Util.first(filterRel.getCluster().getPlanner().getExecutor(), 
> RexUtil.EXECUTOR);
> final RexSimplify simplify = new RexSimplify(rexBuilder, predicates, 
> executor);
> final RexNode x = simplify.simplifyUnknownAs(newCondition, 
> RexUnknownAs.FALSE);{code}
> This is in line with other uses of simplification, like in Calcite:
> [https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L704-L705]
> You can see the above method used for different kind of rels, always in the 
> way identical to what this ticket advocates:filters:
> [https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L152-L155]
> projects:
> [https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L306-L310]
> joins:
> [https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L380-L383]
> windows:
> [https://github.com/apache/calcite/blob/a0ce3275119f804959cda54d6e7a016ab893c359/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L567-L576]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to