kosiew opened a new issue, #17206:
URL: https://github.com/apache/datafusion/issues/17206

   Summary
   -------
   When the build side of a hash join contains only NULL join keys and the join 
is configured with
   `datafusion_common::NullEquality::NullEqualsNothing`, the dynamic filter 
generated for the probe side
   is produced as range comparisons against `NULL` (e.g. `a >= NULL AND a <= 
NULL`). In the current
   optimizer/test code this is treated as a no-op (a tautology) instead of 
being treated as either
   unsatisfiable or as an explicit "no matches" condition.
   
   Why this matters
   ----------------
   - This is a surprising corner case and may hide regressions if semantics or 
simplification rules change.
   - If NullEquality semantics change (or the filter simplifier is updated), 
behavior and test expectations
     could silently diverge.
   - We should monitor and decide whether this should be:
     - left as a tautology (no-op),
     - treated as unsatisfiable (prune everything),
     - or canonicalized/annotated to make intent explicit.
   
   Where to look / repro
   ---------------------
   Test: `datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs`
   Specifically the test `test_hashjoin_dynamic_filter_pushdown_null_keys` in 
#17090
   
   Reproduction steps (test harness)
   1. Run that single async test (or the optimizer tests):
      - cargo test --test <appropriate test binary> --filter 
test_hashjoin_dynamic_filter_pushdown_null_keys
   2. Observe plan printed by `format_plan_for_test(&plan)` contains:
      `DynamicFilterPhysicalExpr [ a@0 >= NULL AND a@0 <= NULL AND b@1 >= NULL 
AND b@1 <= NULL ]`
   
   Observed behavior
   -----------------
   The optimizer generates a dynamic filter with min/max bounds set to NULL; 
the code interprets this as
   a tautology (no filtering). The current test documents this and asserts the 
presence of the `>= NULL` / `<= NULL` pattern.
   
   Expected / options
   ------------------
   We need a decision on intended semantics. Options:
   - Keep current behavior (tautology/no-op). Document it clearly in code/tests.
   - Treat NULL-only build-side as unsatisfiable filter (drop all probe rows).
   - Change filter generation to avoid producing `>= NULL` / `<= NULL` and 
instead produce an explicit marker (e.g., `unsatisfiable`) so optimizer 
simplification can handle it deterministically.
   
   
   
   References
   ----------
   File: `datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs`
   Test: `test_hashjoin_dynamic_filter_pushdown_null_keys`
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to