alamb opened a new issue, #4877:
URL: https://github.com/apache/arrow-datafusion/issues/4877

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   Follow on to https://github.com/apache/arrow-datafusion/issues/4844
   
   The fix for incorrect answers in 
https://github.com/apache/arrow-datafusion/pull/4869 was to skip the 
optimizaton if non equijoins were present. 
   
   This ticket tracks actually supporting removing crossjoins when filters are 
present
   
   Currently datafusion will loss the `join filter` of inner join when run 
`EliminateCrossJoin` rule. Following are query and optimized logical plan:
   ```sql
   explain verbose select t1.t1_id,t2.t2_id,t3.t3_id 
                    from t1 
                    inner join t2 on t1.t1_id > t2.t2_id 
                    cross join t3 
                    where t3.t3_int > t1.t1_int and t1.t1_int > t2.t2_int;
   ```
   
   
   This is because `EliminateCrossJoin` only consider equijoin predicate.
   
   The idea is to rewrite `EliminateCrossJoin`, and choose the right input of 
join based on both equijoin and  non-equijoin predicate. After this pr, the 
logical plan will be:
   ```sql
         Projection: t1.t1_id, t2.t2_id, t3.t3_id
           Inner Join:  Filter: t3.t3_int > t1.t1_int
             Inner Join:  Filter: t1.t1_int > t2.t2_int AND t1.t1_id > t2.t2_id
               TableScan: t1 projection=[t1_id, t1_int]
               TableScan: t2 projection=[t2_id, t2_int]
             TableScan: t3 projection=[t3_id, t3_int]
   ```
   
   The join filter should not be lost.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to