Asif created SPARK-55072:
----------------------------

             Summary: Inferring new Constraint misses IsNotNull, when an Outer 
Join gets converted into inner Join
                 Key: SPARK-55072
                 URL: https://issues.apache.org/jira/browse/SPARK-55072
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 4.1.1, 4.2.0
            Reporter: Asif


Presently, the order of  execution of the Optimization rules is :

 
{quote}{color:#4c9aff}*{{step1}}*{color}
{{Batch("Operator Optimization before Inferring Filters", fixedPoint,}}
{{operatorOptimizationRuleSet: _*),}}
 
{color:#4c9aff}*{{step2}}*{color}
{{Batch("Infer Filters", Once,}}
{{InferFiltersFromGenerate,}}
{{InferFiltersFromConstraints),}}
 
{color:#4c9aff}*{{step3}}*{color}
{{Batch("Operator Optimization after Inferring Filters", fixedPoint,}}
{{operatorOptimizationRuleSet: _*)}}{quote}
 
{{{}In the batch of rules "{}}}{{{}operatorOptimizationRuleSet", the conversion 
of Joins like "Left Outer" to Inner happens.{}}}{{{}{}}}
{{{}After that "{}}}{{{}InferFiltersFromConstraints" is called which is able to 
create new constraints like IsNotNull, to be pushed on either side of the Inner 
Join tables.{}}}{{{}{}}}
 
{{{}Notice that "{}}}{{{}operatorOptimizationRuleSet" is called twice, before 
and after inferring filters.{}}}{{{}{}}}
 
{{It so happens that in TPCDS Q5, atleast, the conversion of LeftOuter to Inner 
for one of the Join cases, happens in {color:#4c9aff}*step3.*{color}}}
 
{{But since, there is no further call of InferFiltersFromConstraints, the 
IsNotNull constraints generation is missed.}}
 
{{{}IMHO the batch rule "{}}}{{{}Infer Filters" should be made part of 
"{}}}{{{}operatorOptimizationRuleSet".  and there should be {color:#4c9aff}*NO 
step3.*{color}{}}}{{{}{}}}
 
{{Will be opening a PR with possible targeted test.}}
 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to