mdashti commented on code in PR #23106:
URL: https://github.com/apache/datafusion/pull/23106#discussion_r3470181133
##########
datafusion/physical-plan/src/joins/hash_join/shared_bounds.rs:
##########
@@ -685,12 +699,49 @@ impl SharedBuildAccumulator {
)?) as Arc<dyn PhysicalExpr>
};
- self.dynamic_filter.update(filter_expr)?;
+ self.dynamic_filter
+ .update(self.preserve_probe_nulls(filter_expr))?;
}
}
Ok(())
}
+
+ /// Keeps probe rows with a NULL key when the join semantics need them.
+ ///
+ /// The build-side predicate drops probe rows whose key is NULL. A
null-aware anti join
+ /// (`NOT IN`) needs that NULL to reach the join so three-valued logic can
collapse the
+ /// result, and a null-equal join needs it to match a build-side NULL.
OR-ing `key IS NULL`
+ /// keeps those rows while preserving the filter's selectivity for the
rest; the join refines
+ /// whatever the widened filter lets through.
+ fn preserve_probe_nulls(
+ &self,
+ filter_expr: Arc<dyn PhysicalExpr>,
+ ) -> Arc<dyn PhysicalExpr> {
+ if self.null_equality != NullEquality::NullEqualsNull &&
!self.null_aware {
+ return filter_expr;
+ }
+ // Only a key that can actually be NULL needs the disjunct; a NOT NULL
key never widens.
+ // Null-aware joins are single-key; null-equal joins can be multi-key,
so OR every nullable
+ // key. If every key is NOT NULL the filter is left untouched, at full
selectivity.
+ let any_key_is_null = self
+ .on_right
+ .iter()
+ .filter(|key| key.nullable(&self.probe_schema).unwrap_or(true))
Review Comment:
This is a should-never-happen (as you said: keys out of sync with the probe
schema), so I kept `unwrap_or(true)` as the safe degradation: over-widening
only loses a little selectivity, while `false` could drop a NULL the join
needs. Documented it in `9620b97`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]