juntaozhang created SPARK-54725:
-----------------------------------
Summary: Add inferring transitive join conditions in
CostBasedJoinReorder
Key: SPARK-54725
URL: https://issues.apache.org/jira/browse/SPARK-54725
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.3.1
Reporter: juntaozhang
Fix For: 3.3.1
Currently, *CostBasedJoinReorder* only considers explicitly specified join
conditions when evaluating join orders. However, it misses optimization
opportunities where transitive equality relationships exist between attributes
from different plans that could form additional join conditions.
Consider the following scenario:
{code}
SELECT o.order_id, o.user_id, o.region_id, order_num, user_name, region_name
FROM order o
JOIN user u ON o.user_id = u.user_id AND o.region_id = u.region_id
JOIN region r ON o.region_id = r.region_id
{code}
When JoinReorderDP considers joining user and region tables:
- It recognizes o.user_id = u.user_id (o ↔ u)
- It recognizes o.region_id = u.region_id (o ↔ u)
- It recognizes o.region_id = r.region_id (o ↔ r)
- But it misses: u.region_id = r.region_id (u ↔ r)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]