alamb commented on a change in pull request #1339:
URL: https://github.com/apache/arrow-datafusion/pull/1339#discussion_r758304029



##########
File path: datafusion/src/optimizer/filter_push_down.rs
##########
@@ -409,10 +420,15 @@ fn optimize(plan: &LogicalPlan, mut state: State) -> 
Result<LogicalPlan> {
             //
             // Join clauses with `Using` constraints also take advantage of 
this logic to make sure
             // predicates reference the shared join columns are pushed to both 
sides.

Review comment:
       I think you are on the right track here that the filter_push_down is not 
correct for outer joins 
   
   However, I don't think the ability to push filters is related to if there is 
an `IS NULL` predicate. The code should NEVER try to push predicates for outer 
join
   
   So schematically, given this input:
   
   ```
   Filter: A.a > 5
     Inner Join(A.a = B.b)
       Scan A
       Scan B
   ```
   
   The code is effectively both pushing down `A.a > 5` as well as adding a 
*new* predicate, `B.b > 5` during the scan:
   ```
   Filter: A.a > 5
     Inner Join(A.a = B.b)
       Scan A (A.a > 5) 
       Scan B (B.b > 5)
   ```
   
   However, this transformation is not correct for outer joins where only the 
predicates on the preserved side (A in this case) can be pushed down. Thus for
   
   ```
   Filter: A.a > 5
     LEFT OUTER Join(A.a = B.b)
       Scan A
       Scan B
   ```
   
   We should still push `A.a > 5` but not introduce `B.b > 5`:
   ```
   Filter: A.a > 5
     LEFT OUTER Join(A.a = B.b)
       Scan A (A.a > 5)
       Scan B
   ```
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to