Jaromir Vanek created SPARK-53996:
-------------------------------------

             Summary: InferFiltersFromConstraints rule does not infer filter 
conditions for complex join expressions
                 Key: SPARK-53996
                 URL: https://issues.apache.org/jira/browse/SPARK-53996
             Project: Spark
          Issue Type: Improvement
          Components: Optimizer
    Affects Versions: 4.0.1
            Reporter: Jaromir Vanek


The Spark optimizer's {{InferFiltersFromConstraints}} rule is currently unable 
to infer filter conditions when join constraints involve complex expressions. 
While it works for simple attribute equalities ({{{}a = b{}}}), it can't infer 
constraint from anything more complex than that.

*Example (works as expected):*
{code:sql}
SELECT *
FROM t1
JOIN t2 ON t1.a = t2.b
WHERE t2.b = 1
{code}
In this case, the optimizer correctly infers the additional constraint {{{}t1.a 
= 1{}}}.

*Example (does not work):*
{code:sql}
SELECT *
FROM t1
JOIN right ON t1.a = t2.b + 2
WHERE t2.b = 1
{code}
In this case, it is clear that {{t1.a = 3}} (since {{b = 1}} and {{{}a = b + 
2{}}}), but the optimizer does not infer this constraint.

*How to Reproduce:*
{code:scala}
spark.sql("CREATE TABLE t1(a INT)")
spark.sql("CREATE TABLE t2(b INT)")

spark.sql("""
SELECT * 
FROM t1 
INNER JOIN t2 ON t2.b = t1.a + 2 
WHERE t1.a = 1
""").explain
{code}
{code:java}
== Physical Plan ==
AdaptiveSparkPlan
+- BroadcastHashJoin [(a#2 + 2)], [b#3], Inner, BuildRight, false
   :- Filter (isnotnull(a#2) AND (a#2 = 1))
   :  +- FileScan spark_catalog.default.t1[a#2]
      +- Filter isnotnull(b#3)
         +- FileScan spark_catalog.default.t2[b#3]
{code}
*Expected Behavior:*
The optimizer should be able to statically evaluate and infer that {{t2.b = 3}} 
given the join condition and the filter on {{{}t1.a{}}}.

*Impact:*
This limits the optimizer's ability to push down filters and optimize query 
execution plans for queries with complex join conditions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to