kosiew opened a new issue, #17206: URL: https://github.com/apache/datafusion/issues/17206
Summary ------- When the build side of a hash join contains only NULL join keys and the join is configured with `datafusion_common::NullEquality::NullEqualsNothing`, the dynamic filter generated for the probe side is produced as range comparisons against `NULL` (e.g. `a >= NULL AND a <= NULL`). In the current optimizer/test code this is treated as a no-op (a tautology) instead of being treated as either unsatisfiable or as an explicit "no matches" condition. Why this matters ---------------- - This is a surprising corner case and may hide regressions if semantics or simplification rules change. - If NullEquality semantics change (or the filter simplifier is updated), behavior and test expectations could silently diverge. - We should monitor and decide whether this should be: - left as a tautology (no-op), - treated as unsatisfiable (prune everything), - or canonicalized/annotated to make intent explicit. Where to look / repro --------------------- Test: `datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs` Specifically the test `test_hashjoin_dynamic_filter_pushdown_null_keys` in #17090 Reproduction steps (test harness) 1. Run that single async test (or the optimizer tests): - cargo test --test <appropriate test binary> --filter test_hashjoin_dynamic_filter_pushdown_null_keys 2. Observe plan printed by `format_plan_for_test(&plan)` contains: `DynamicFilterPhysicalExpr [ a@0 >= NULL AND a@0 <= NULL AND b@1 >= NULL AND b@1 <= NULL ]` Observed behavior ----------------- The optimizer generates a dynamic filter with min/max bounds set to NULL; the code interprets this as a tautology (no filtering). The current test documents this and asserts the presence of the `>= NULL` / `<= NULL` pattern. Expected / options ------------------ We need a decision on intended semantics. Options: - Keep current behavior (tautology/no-op). Document it clearly in code/tests. - Treat NULL-only build-side as unsatisfiable filter (drop all probe rows). - Change filter generation to avoid producing `>= NULL` / `<= NULL` and instead produce an explicit marker (e.g., `unsatisfiable`) so optimizer simplification can handle it deterministically. References ---------- File: `datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs` Test: `test_hashjoin_dynamic_filter_pushdown_null_keys` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org