alamb opened a new issue, #4877: URL: https://github.com/apache/arrow-datafusion/issues/4877
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Follow on to https://github.com/apache/arrow-datafusion/issues/4844 The fix for incorrect answers in https://github.com/apache/arrow-datafusion/pull/4869 was to skip the optimizaton if non equijoins were present. This ticket tracks actually supporting removing crossjoins when filters are present Currently datafusion will loss the `join filter` of inner join when run `EliminateCrossJoin` rule. Following are query and optimized logical plan: ```sql explain verbose select t1.t1_id,t2.t2_id,t3.t3_id from t1 inner join t2 on t1.t1_id > t2.t2_id cross join t3 where t3.t3_int > t1.t1_int and t1.t1_int > t2.t2_int; ``` This is because `EliminateCrossJoin` only consider equijoin predicate. The idea is to rewrite `EliminateCrossJoin`, and choose the right input of join based on both equijoin and non-equijoin predicate. After this pr, the logical plan will be: ```sql Projection: t1.t1_id, t2.t2_id, t3.t3_id Inner Join: Filter: t3.t3_int > t1.t1_int Inner Join: Filter: t1.t1_int > t2.t2_int AND t1.t1_id > t2.t2_id TableScan: t1 projection=[t1_id, t1_int] TableScan: t2 projection=[t2_id, t2_int] TableScan: t3 projection=[t3_id, t3_int] ``` The join filter should not be lost. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
