Dandandan commented on code in PR #19635:
URL: https://github.com/apache/datafusion/pull/19635#discussion_r2675462938
##########
datafusion/physical-optimizer/src/join_selection.rs:
##########
@@ -232,19 +237,33 @@ pub(crate) fn partitioned_hash_join(
) -> Result<Arc<dyn ExecutionPlan>> {
let left = hash_join.left();
let right = hash_join.right();
- if hash_join.join_type().supports_swap() &&
should_swap_join_order(&**left, &**right)?
+ // Don't swap null-aware anti joins as they have specific side requirements
+ if hash_join.join_type().supports_swap()
+ && !hash_join.null_aware
+ && should_swap_join_order(&**left, &**right)?
{
hash_join.swap_inputs(PartitionMode::Partitioned)
} else {
+ // Null-aware anti joins must use CollectLeft mode because they track
probe-side state
+ // (probe_side_non_empty, probe_side_has_null) per-partition, but need
global knowledge
+ // for correct null handling. With partitioning, a partition might not
see probe rows
+ // even if the probe side is globally non-empty, leading to incorrect
NULL row handling.
+ let partition_mode = if hash_join.null_aware {
Review Comment:
Can we avoid `CollectLeft` as fallback if the keys are not nullable or is
this done already?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]