xiedeyantu opened a new issue, #22369:
URL: https://github.com/apache/datafusion/issues/22369

   ### Is your feature request related to a problem or challenge?
   
   Currently, DataFusion handles conjunctive join conditions well. For example, 
conditions like:
   ```
   a.id = b.id AND a.v > 10
   ```
   can already be optimized by extracting equijoin keys while preserving the 
remaining predicates as join filters.
   
   However, for queries such as:
   ```
   SELECT *
   FROM t1
   INNER JOIN t2
     ON t1.id = t2.id OR t1.age = t2.age
   ```
   the presence of an OR condition prevents join key extraction. As a result, 
the optimizer typically falls back to a single inner join with a complex filter 
condition, making it difficult for later optimization stages and the execution 
engine to take advantage of equijoin-based optimizations. Therefore, this type 
of join will be executed using nested loop join.
   
   This proposal introduces a new logical optimizer rule that rewrites:
   ```
   t1 INNER JOIN t2 ON cond1 OR cond2 OR ... OR condN
   ```
   into:
   ```
   SELECT * FROM (
     (t1 INNER JOIN t2 ON cond1)
     UNION ALL
     (t1 INNER JOIN t2 ON cond2 AND NOT cond1)
     UNION ALL
     (t1 INNER JOIN t2 ON cond3 AND NOT cond1 AND NOT cond2)
     ...
   )
   ```
   Each branch becomes mutually exclusive, preserving the original semantics 
while exposing simpler join conditions that can be optimized independently.
   
   Initial Scope
   
   The first implementation only targets the following cases:
   
   - Only INNER JOIN
   - Only when the join condition contains a top-level splittable OR
   - Logical plan rewrite only; no execution-layer changes
   - No support for outer joins, semi/anti joins, or full joins
   - No join reordering across multi-level join trees; only rewrite the current 
join node
   
   Rewrite Example
   
   Input:
   ```
   SELECT *
   FROM t1
   INNER JOIN t2
     ON t1.id = t2.id OR t1.age = t2.age
   ```
   Rewritten as:
   ```
   SELECT *
   FROM (
     SELECT * FROM t1 INNER JOIN t2 ON t1.id = t2.id
     UNION ALL
     SELECT * FROM t1 INNER JOIN t2
       ON t1.age = t2.age AND NOT (t1.id = t2.id)
   )
   ```
   
   ### Describe the solution you'd like
   
   _No response_
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to