[
https://issues.apache.org/jira/browse/SPARK-55072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Asif updated SPARK-55072:
-------------------------
Description:
Presently, the order of execution of the Optimization rules is :
{quote}{color:#4c9aff}*{{step1}}*{color}
{{Batch("Operator Optimization before Inferring Filters", fixedPoint,}}
{{operatorOptimizationRuleSet: _*),}}
{color:#4c9aff}*{{step2}}*{color}
{{Batch("Infer Filters", Once,}}
{{InferFiltersFromGenerate,}}
{{InferFiltersFromConstraints),}}
{color:#4c9aff}*{{step3}}*{color}
{{Batch("Operator Optimization after Inferring Filters", fixedPoint,}}
{{operatorOptimizationRuleSet: _*)}}
{quote}
{{{}In the batch of rules "{}}}{{{}operatorOptimizationRuleSet", the conversion
of Joins like "Left Outer" to Inner happens.{}}}{{{{}}{}}}
{{{}After that "{}}}{{{}InferFiltersFromConstraints" is called which is able to
create new constraints like IsNotNull, to be pushed on either side of the Inner
Join tables.{}}}{{{{}}{}}}
{{{}Notice that "{}}}{{{}operatorOptimizationRuleSet" is called twice, before
and after inferring filters.{}}}{{{{}}{}}}
{{It so happens that in TPCDS Q5, atleast, the conversion of LeftOuter to Inner
for one of the Join cases, happens in {color:#4c9aff}*step3.*{color}}}
{{But since, there is no further call of InferFiltersFromConstraints, the
IsNotNull constraints generation is missed, for the Left Leg of the Join.}}
{{{}IMHO the batch rule "{}}}{{{}Infer Filters" should be made part of
"{}}}{{{}operatorOptimizationRuleSet". and there should be {color:#4c9aff}*NO
step3.*{color}{}}}{{{{}}{}}}
{{Will be opening a PR with possible targeted test.}}
was:
Presently, the order of execution of the Optimization rules is :
{quote}{color:#4c9aff}*{{step1}}*{color}
{{Batch("Operator Optimization before Inferring Filters", fixedPoint,}}
{{operatorOptimizationRuleSet: _*),}}
{color:#4c9aff}*{{step2}}*{color}
{{Batch("Infer Filters", Once,}}
{{InferFiltersFromGenerate,}}
{{InferFiltersFromConstraints),}}
{color:#4c9aff}*{{step3}}*{color}
{{Batch("Operator Optimization after Inferring Filters", fixedPoint,}}
{{operatorOptimizationRuleSet: _*)}}{quote}
{{{}In the batch of rules "{}}}{{{}operatorOptimizationRuleSet", the conversion
of Joins like "Left Outer" to Inner happens.{}}}{{{}{}}}
{{{}After that "{}}}{{{}InferFiltersFromConstraints" is called which is able to
create new constraints like IsNotNull, to be pushed on either side of the Inner
Join tables.{}}}{{{}{}}}
{{{}Notice that "{}}}{{{}operatorOptimizationRuleSet" is called twice, before
and after inferring filters.{}}}{{{}{}}}
{{It so happens that in TPCDS Q5, atleast, the conversion of LeftOuter to Inner
for one of the Join cases, happens in {color:#4c9aff}*step3.*{color}}}
{{But since, there is no further call of InferFiltersFromConstraints, the
IsNotNull constraints generation is missed.}}
{{{}IMHO the batch rule "{}}}{{{}Infer Filters" should be made part of
"{}}}{{{}operatorOptimizationRuleSet". and there should be {color:#4c9aff}*NO
step3.*{color}{}}}{{{}{}}}
{{Will be opening a PR with possible targeted test.}}
> Inferring new Constraint misses IsNotNull on Left Leg, when an Outer Join
> gets converted into inner Join
> --------------------------------------------------------------------------------------------------------
>
> Key: SPARK-55072
> URL: https://issues.apache.org/jira/browse/SPARK-55072
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 4.2.0, 4.1.1
> Reporter: Asif
> Priority: Major
>
> Presently, the order of execution of the Optimization rules is :
>
> {quote}{color:#4c9aff}*{{step1}}*{color}
> {{Batch("Operator Optimization before Inferring Filters", fixedPoint,}}
> {{operatorOptimizationRuleSet: _*),}}
>
> {color:#4c9aff}*{{step2}}*{color}
> {{Batch("Infer Filters", Once,}}
> {{InferFiltersFromGenerate,}}
> {{InferFiltersFromConstraints),}}
>
> {color:#4c9aff}*{{step3}}*{color}
> {{Batch("Operator Optimization after Inferring Filters", fixedPoint,}}
> {{operatorOptimizationRuleSet: _*)}}
> {quote}
>
> {{{}In the batch of rules "{}}}{{{}operatorOptimizationRuleSet", the
> conversion of Joins like "Left Outer" to Inner happens.{}}}{{{{}}{}}}
> {{{}After that "{}}}{{{}InferFiltersFromConstraints" is called which is able
> to create new constraints like IsNotNull, to be pushed on either side of the
> Inner Join tables.{}}}{{{{}}{}}}
>
> {{{}Notice that "{}}}{{{}operatorOptimizationRuleSet" is called twice, before
> and after inferring filters.{}}}{{{{}}{}}}
>
> {{It so happens that in TPCDS Q5, atleast, the conversion of LeftOuter to
> Inner for one of the Join cases, happens in {color:#4c9aff}*step3.*{color}}}
>
> {{But since, there is no further call of InferFiltersFromConstraints, the
> IsNotNull constraints generation is missed, for the Left Leg of the Join.}}
>
> {{{}IMHO the batch rule "{}}}{{{}Infer Filters" should be made part of
> "{}}}{{{}operatorOptimizationRuleSet". and there should be
> {color:#4c9aff}*NO step3.*{color}{}}}{{{{}}{}}}
>
> {{Will be opening a PR with possible targeted test.}}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]