kosiew opened a new pull request, #22870:
URL: https://github.com/apache/datafusion/pull/22870
### Which issue does this PR close?
* Part of #22686
## Rationale for this change
`EliminateOuterJoin` makes join conversion decisions using side-level
null-rejection information (`left_non_nullable` and `right_non_nullable`), but
the existing implementation first collects null-rejecting columns and then
derives side-level evidence by scanning that collection.
This intermediate column-based representation adds indirection and
complicates reasoning about `AND`/`OR` null-rejection semantics. This change
refactors the analysis to track null-rejection evidence directly at the
join-side level while preserving existing behavior.
## What changes are included in this PR?
* Replaced column-based null-rejection tracking with a new private
`NullRejectingSides` helper type containing `left` and `right` boolean flags.
* Added helper methods for combining side-level evidence:
* `union`
* `intersection`
* Refactored null-rejection extraction to return side-level evidence
directly via `extract_null_rejecting_sides`.
* Simplified `try_simplify_join` by removing the intermediate `Vec<Column>`
collection and schema scan used to derive side-level information.
* Preserved existing null-rejection semantics for:
* top-level `AND` chains
* `OR` expressions
* nested `AND` expressions
* null-propagating operators
* `IS NOT NULL`, `IS TRUE`, `IS FALSE`, and `IS NOT UNKNOWN` top-level
handling
* Added unit tests for:
* `NullRejectingSides::union`
* `NullRejectingSides::intersection`
## Are these changes tested?
Yes.
Added unit tests covering:
* `null_rejecting_sides_union`
* `null_rejecting_sides_intersection`
The existing `eliminate_outer_join` test suite is expected to continue
validating optimizer behavior unchanged, since this is intended to be a
behavior-preserving refactor.
## Are there any user-facing changes?
No.
This is an internal optimizer refactor intended to preserve existing
join-elimination behavior and does not change public APIs or SQL semantics.
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content
has been manually reviewed and tested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]