ygf11 commented on code in PR #4826: URL: https://github.com/apache/arrow-datafusion/pull/4826#discussion_r1070255065
########## datafusion/optimizer/src/decorrelate_where_in.rs: ########## @@ -554,14 +636,12 @@ mod tests { .project(vec![col("customer.c_custkey")])? .build()?; - // Query will fail, but we can still transform the plan let expected = "Projection: customer.c_custkey [c_custkey:Int64]\ - \n LeftSemi Join: customer.c_custkey = __correlated_sq_1.o_custkey [c_custkey:Int64, c_name:Utf8]\ + \n LeftSemi Join: Filter: customer.c_custkey = __correlated_sq_1.o_custkey AND customer.c_custkey = customer.c_custkey [c_custkey:Int64, c_name:Utf8]\ Review Comment: Yes, it is better to do by other rule. But I find our `predicate pushdown logic` does not support these predicates(semi-join-on) now. https://github.com/apache/arrow-datafusion/blob/dee0dd8be6745f6cb798ba56dca6c1b936d90fd6/datafusion/optimizer/src/push_down_filter.rs#L103-L115 Maybe we should support `predicate push down` for `semi-join-on`, or we need move these predicates to the outer filter, then `predicate push down` can work(Like following). ```rust // for sql: select * from t1 where t1.t1_int in(select t2.t2_int from t2 where t1.t1_id > 10); // the output of this rule: Projection: t1.t1_id, t1.t1_name, t1.t1_int Filter: CAST(t1.t1_id AS Int64) > Int64(10) LeftSemi Join: Filter: t1.t1_int = __correlated_sq_2.t2_int TableScan: t1 SubqueryAlias: __correlated_sq_2 Projection: t2.t2_int AS t2_int TableScan: t2 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org