kosiew opened a new issue, #17206:
URL: https://github.com/apache/datafusion/issues/17206
Summary
-------
When the build side of a hash join contains only NULL join keys and the join
is configured with
`datafusion_common::NullEquality::NullEqualsNothing`, the dynamic filter
generated for the probe side
is produced as range comparisons against `NULL` (e.g. `a >= NULL AND a <=
NULL`). In the current
optimizer/test code this is treated as a no-op (a tautology) instead of
being treated as either
unsatisfiable or as an explicit "no matches" condition.
Why this matters
----------------
- This is a surprising corner case and may hide regressions if semantics or
simplification rules change.
- If NullEquality semantics change (or the filter simplifier is updated),
behavior and test expectations
could silently diverge.
- We should monitor and decide whether this should be:
- left as a tautology (no-op),
- treated as unsatisfiable (prune everything),
- or canonicalized/annotated to make intent explicit.
Where to look / repro
---------------------
Test: `datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs`
Specifically the test `test_hashjoin_dynamic_filter_pushdown_null_keys` in
#17090
Reproduction steps (test harness)
1. Run that single async test (or the optimizer tests):
- cargo test --test <appropriate test binary> --filter
test_hashjoin_dynamic_filter_pushdown_null_keys
2. Observe plan printed by `format_plan_for_test(&plan)` contains:
`DynamicFilterPhysicalExpr [ a@0 >= NULL AND a@0 <= NULL AND b@1 >= NULL
AND b@1 <= NULL ]`
Observed behavior
-----------------
The optimizer generates a dynamic filter with min/max bounds set to NULL;
the code interprets this as
a tautology (no filtering). The current test documents this and asserts the
presence of the `>= NULL` / `<= NULL` pattern.
Expected / options
------------------
We need a decision on intended semantics. Options:
- Keep current behavior (tautology/no-op). Document it clearly in code/tests.
- Treat NULL-only build-side as unsatisfiable filter (drop all probe rows).
- Change filter generation to avoid producing `>= NULL` / `<= NULL` and
instead produce an explicit marker (e.g., `unsatisfiable`) so optimizer
simplification can handle it deterministically.
References
----------
File: `datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs`
Test: `test_hashjoin_dynamic_filter_pushdown_null_keys`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]