Asif created SPARK-55072:
----------------------------
Summary: Inferring new Constraint misses IsNotNull, when an Outer
Join gets converted into inner Join
Key: SPARK-55072
URL: https://issues.apache.org/jira/browse/SPARK-55072
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 4.1.1, 4.2.0
Reporter: Asif
Presently, the order of execution of the Optimization rules is :
{quote}{color:#4c9aff}*{{step1}}*{color}
{{Batch("Operator Optimization before Inferring Filters", fixedPoint,}}
{{operatorOptimizationRuleSet: _*),}}
{color:#4c9aff}*{{step2}}*{color}
{{Batch("Infer Filters", Once,}}
{{InferFiltersFromGenerate,}}
{{InferFiltersFromConstraints),}}
{color:#4c9aff}*{{step3}}*{color}
{{Batch("Operator Optimization after Inferring Filters", fixedPoint,}}
{{operatorOptimizationRuleSet: _*)}}{quote}
{{{}In the batch of rules "{}}}{{{}operatorOptimizationRuleSet", the conversion
of Joins like "Left Outer" to Inner happens.{}}}{{{}{}}}
{{{}After that "{}}}{{{}InferFiltersFromConstraints" is called which is able to
create new constraints like IsNotNull, to be pushed on either side of the Inner
Join tables.{}}}{{{}{}}}
{{{}Notice that "{}}}{{{}operatorOptimizationRuleSet" is called twice, before
and after inferring filters.{}}}{{{}{}}}
{{It so happens that in TPCDS Q5, atleast, the conversion of LeftOuter to Inner
for one of the Join cases, happens in {color:#4c9aff}*step3.*{color}}}
{{But since, there is no further call of InferFiltersFromConstraints, the
IsNotNull constraints generation is missed.}}
{{{}IMHO the batch rule "{}}}{{{}Infer Filters" should be made part of
"{}}}{{{}operatorOptimizationRuleSet". and there should be {color:#4c9aff}*NO
step3.*{color}{}}}{{{}{}}}
{{Will be opening a PR with possible targeted test.}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]