alamb commented on code in PR #17518:
URL: https://github.com/apache/datafusion/pull/17518#discussion_r2359292682


##########
datafusion/common/src/join_type.rs:
##########
@@ -74,6 +74,12 @@ pub enum JoinType {
     RightMark,
 }
 
+const LEFT_PRESERVING: &[JoinType] =
+    &[JoinType::Left, JoinType::Full, JoinType::LeftMark];

Review Comment:
   > Semi and Anti joins purposely aren’t listed in LEFT_PRESERVING: they only 
return the subset of left rows that either match (Semi) or don’t match (Anti), 
so they don’t preserve all left input rows. The right-side analogue has the 
same behaviour, which is why only the outer/mark variants appear in those 
preservation tables.
   
   You are right. Sorry about that.  Given I think this logic is trying to 
figure out when to push down predicates, I was thinking it should also be 
considering SEMI and ANTI joins. 
   
   For example, I was thinking about these two joins
   
   ```sql
   a LEFT JOIN b ON (...)
   WHERE a.x < 5 AND b.y < 10
   ```
   
   
   ```sql
   A ANTI JOIN b ON (...)
   WHERE a.x < 5 AND b.y < 10
   ```
   
   In both cases, I believe it is correct to push the predicate on `a.x` below 
the join on the `a` side. However, also in both cases I don't think it is 
correct to push the `b.y` predicate below the join:
   
   For the `LEFT JOIN` pushing `b.y` can re-introduce rows from `a`  that 
should be filtered after the join. Same thing for `ANTI JOIN`
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to