alamb commented on a change in pull request #1618:
URL: https://github.com/apache/arrow-datafusion/pull/1618#discussion_r790020079
##########
File path: datafusion/src/optimizer/filter_push_down.rs
##########
@@ -253,33 +176,107 @@ fn split_members<'a>(predicate: &'a Expr, predicates:
&mut Vec<&'a Expr>) {
}
}
+// For a given JOIN logical plan, determine whether each side of the join is
preserved.
+// We say a join side is preserved if the join returns all or a subset of the
rows from
+// the relevant side - i.e. the side of the join cannot provide nulls. Returns
a tuple
+// of booleans - (left_preserved, right_preserved).
+fn lr_is_preserved(plan: &LogicalPlan) -> (bool, bool) {
+ match plan {
+ LogicalPlan::Join(Join { join_type, .. }) => match join_type {
+ JoinType::Inner => (true, true),
+ JoinType::Left => (true, false),
+ JoinType::Right => (false, true),
+ JoinType::Full => (false, false),
Review comment:
I think there are two different things making this confusing:
1. for predicates that are `null preserving` (like `t1 <= 5`) in the
`WHERE` clause it is actually fine to create a predicate like `t2 < 5` even for
outer joins (try it with postgres, you'll find that predicate filters out any
rows that didn't have matches on either side so it is ok)
2. For predicates that are not `null preserving` (like `t1 is NULL`) you
*CAN'T* push predicates down (which I think is the core of this PR)
To make things even more confusing, the rules of when you can push / not
push are different if the predicate is in the `ON` clause)
Also I don't think DataFusion currently tracks the `null preserving`
property for `Exprs` -- so the safe (correct thing) is to treat all predicates
like they may not be `null preserving` and not create/push them for outer
joins. A follow on exercise would be to allow more predicates to be pushed down
by checking `null preserving`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]