ygf11 commented on code in PR #4826:
URL: https://github.com/apache/arrow-datafusion/pull/4826#discussion_r1070255065


##########
datafusion/optimizer/src/decorrelate_where_in.rs:
##########
@@ -554,14 +636,12 @@ mod tests {
             .project(vec![col("customer.c_custkey")])?
             .build()?;
 
-        // Query will fail, but we can still transform the plan
         let expected = "Projection: customer.c_custkey [c_custkey:Int64]\
-        \n  LeftSemi Join: customer.c_custkey = __correlated_sq_1.o_custkey 
[c_custkey:Int64, c_name:Utf8]\
+        \n  LeftSemi Join:  Filter: customer.c_custkey = 
__correlated_sq_1.o_custkey AND customer.c_custkey = customer.c_custkey 
[c_custkey:Int64, c_name:Utf8]\

Review Comment:
   Yes, it is better to do by other rule.
   But I find our `predicate pushdown logic` does not support these 
predicates(semi-join-on) now.
   
https://github.com/apache/arrow-datafusion/blob/dee0dd8be6745f6cb798ba56dca6c1b936d90fd6/datafusion/optimizer/src/push_down_filter.rs#L103-L115
   
   Maybe we should support `predicate push down` for `semi-join-on`, or we need 
move these predicates to the outer filter, then `predicate push down` can 
work(Like following).
   ```rust
   // for sql: select * from t1 where t1.t1_int in(select t2.t2_int from t2 
where t1.t1_id > 10);
   // the output of this rule:
   Projection: t1.t1_id, t1.t1_name, t1.t1_int                                  
                                                                                
                                                                                
                                        
     Filter: CAST(t1.t1_id AS Int64) > Int64(10)                                
                                                                                
                                                                                
                                          
        LeftSemi Join:  Filter: t1.t1_int = __correlated_sq_2.t2_int            
                                                                                
                                                                                
                                              
          TableScan: t1                                                         
                                                                                
                                                                                
                                              
          SubqueryAlias: __correlated_sq_2                                      
                                                                                
                                                                                
                                                
            Projection: t2.t2_int AS t2_int                                     
                                                                                
                                                                                
                                                  
              TableScan: t2                                                     
                                                                                
                                                                                
                                       
   ```
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to