ygf11 commented on code in PR #4826: URL: https://github.com/apache/arrow-datafusion/pull/4826#discussion_r1070255065
##########
datafusion/optimizer/src/decorrelate_where_in.rs:
##########
@@ -554,14 +636,12 @@ mod tests {
.project(vec![col("customer.c_custkey")])?
.build()?;
- // Query will fail, but we can still transform the plan
let expected = "Projection: customer.c_custkey [c_custkey:Int64]\
- \n LeftSemi Join: customer.c_custkey = __correlated_sq_1.o_custkey
[c_custkey:Int64, c_name:Utf8]\
+ \n LeftSemi Join: Filter: customer.c_custkey =
__correlated_sq_1.o_custkey AND customer.c_custkey = customer.c_custkey
[c_custkey:Int64, c_name:Utf8]\
Review Comment:
Yes, it is better to do by other rule.
But I find our `predicate pushdown logic` does not support these
predicates(semi-join-on) now.
https://github.com/apache/arrow-datafusion/blob/dee0dd8be6745f6cb798ba56dca6c1b936d90fd6/datafusion/optimizer/src/push_down_filter.rs#L103-L115
Maybe we should support `predicate push down` for `semi-join-on`, or we need
move these predicates to the outside of join, then `predicate push down` can
work(Like following).
```rust
// for sql: select * from t1 where t1.t1_int in(select t2.t2_int from t2
where t1.t1_id > 10);
// the output of this rule:
Projection: t1.t1_id, t1.t1_name, t1.t1_int
Filter: CAST(t1.t1_id AS Int64) > Int64(10)
LeftSemi Join: Filter: t1.t1_int = __correlated_sq_2.t2_int
TableScan: t1
SubqueryAlias: __correlated_sq_2
Projection: t2.t2_int AS t2_int
TableScan: t2
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
