adriangb commented on code in PR #23104:
URL: https://github.com/apache/datafusion/pull/23104#discussion_r3465270114


##########
datafusion/physical-plan/src/joins/hash_join/shared_bounds.rs:
##########
@@ -685,12 +691,35 @@ impl SharedBuildAccumulator {
                     )?) as Arc<dyn PhysicalExpr>
                 };
 
-                self.dynamic_filter.update(filter_expr)?;
+                self.dynamic_filter
+                    .update(self.null_aware_filter(filter_expr))?;
             }
         }
 
         Ok(())
     }
+
+    /// Wraps a pushdown filter so a null-aware anti join keeps its probe-side 
NULL rows.
+    ///
+    /// The build-side predicate drops probe rows whose key is NULL, but `NOT 
IN` three-valued
+    /// logic needs that NULL to reach the join. OR-ing `probe_key IS NULL` 
preserves the dynamic
+    /// filter's selectivity for non-NULL rows while letting the NULL through.
+    fn null_aware_filter(
+        &self,
+        filter_expr: Arc<dyn PhysicalExpr>,
+    ) -> Arc<dyn PhysicalExpr> {
+        if !self.null_aware {
+            return filter_expr;
+        }
+        // A null-aware anti join is validated to a single probe key.
+        let probe_key_is_null: Arc<dyn PhysicalExpr> =
+            Arc::new(IsNullExpr::new(Arc::clone(&self.on_right[0])));
+        Arc::new(BinaryExpr::new(
+            filter_expr,
+            Operator::Or,

Review Comment:
   Can we flip the order? We already have issues w/ `filter_expr` being too 
expensive. `probe_key_is_null` almost certainly is super cheap, so if we put it 
first we might on the balance end up w/ better perf in cases where it filters 
out a lot of rows?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to