juntaozhang created SPARK-54725:
-----------------------------------

             Summary: Add inferring transitive join conditions in 
CostBasedJoinReorder
                 Key: SPARK-54725
                 URL: https://issues.apache.org/jira/browse/SPARK-54725
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.3.1
            Reporter: juntaozhang
             Fix For: 3.3.1


Currently, *CostBasedJoinReorder* only considers explicitly specified join 
conditions when evaluating join orders. However, it misses optimization 
opportunities where transitive equality relationships exist between attributes 
from different plans that could form additional join conditions.

Consider the following scenario:
{code}
SELECT o.order_id, o.user_id, o.region_id, order_num, user_name, region_name
FROM order o
JOIN user u ON o.user_id = u.user_id AND o.region_id = u.region_id
JOIN region r ON o.region_id = r.region_id
{code}

When JoinReorderDP considers joining user and region tables:
- It recognizes o.user_id = u.user_id (o ↔ u) 
- It recognizes o.region_id = u.region_id (o ↔ u) 
- It recognizes o.region_id = r.region_id (o ↔ r) 
- But it misses: u.region_id = r.region_id (u ↔ r) 





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to