RatulDawar commented on code in PR #23106:
URL: https://github.com/apache/datafusion/pull/23106#discussion_r3476185541
##########
datafusion/physical-plan/src/joins/hash_join/shared_bounds.rs:
##########
@@ -685,12 +699,49 @@ impl SharedBuildAccumulator {
)?) as Arc<dyn PhysicalExpr>
};
- self.dynamic_filter.update(filter_expr)?;
+ self.dynamic_filter
+ .update(self.preserve_probe_nulls(filter_expr))?;
}
}
Ok(())
}
+
+ /// Keeps probe rows with a NULL key when the join semantics need them.
+ ///
+ /// The build-side predicate drops probe rows whose key is NULL. A
null-aware anti join
+ /// (`NOT IN`) needs that NULL to reach the join so three-valued logic can
collapse the
+ /// result, and a null-equal join needs it to match a build-side NULL.
OR-ing `key IS NULL`
+ /// keeps those rows while preserving the filter's selectivity for the
rest; the join refines
+ /// whatever the widened filter lets through.
+ fn preserve_probe_nulls(
+ &self,
+ filter_expr: Arc<dyn PhysicalExpr>,
+ ) -> Arc<dyn PhysicalExpr> {
+ if self.null_equality != NullEquality::NullEqualsNull &&
!self.null_aware {
+ return filter_expr;
+ }
+ // Only a key that can actually be NULL needs the disjunct; a NOT NULL
key never widens.
+ // Null-aware joins are single-key; null-equal joins can be multi-key,
so OR every nullable
+ // key. If every key is NOT NULL the filter is left untouched, at full
selectivity.
+ let any_key_is_null = self
+ .on_right
+ .iter()
+ .filter(|key| key.nullable(&self.probe_schema).unwrap_or(true))
Review Comment:
I was thinking how an invalid state if achieved somehow should be handled,
instead of silently handling it shouldn't we propagate the error further.
The fail safe check was added here
https://github.com/apache/datafusion/pull/3238
Though I am not sure what's the consensus for things like these, so a
commiter's input would be helpful here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]