nathanb9 opened a new pull request, #22865: URL: https://github.com/apache/datafusion/pull/22865
## Which issue does this PR close? - Closes #22808. ## Rationale for this change Follow-up to #22791, as suggested in review by @2010YOUY01. That PR fixed a double-decrement bug where `EmitLeftUnmatched` did two jobs at once — deciding whether a partition emits unmatched-left rows (which decrements the shared `probe_threads_counter`) and performing the emit. Because the state is re-enterable (a ready batch can be flushed before the state advances to `Done`), the counter could be decremented twice, driving it to zero before all partitions finished probing and emitting spurious NULL-padded rows. #22791 patched this with a `probe_completed_reported` guard flag. This refactor makes "decrement exactly once per probe stream" a structural property of the state graph rather than a runtime guard, so the inner logic is easier to follow and the bug is harder to reintroduce. ## What changes are included in this PR? Restructures the state machine from `FetchingRight → EmitLeftUnmatched` to `FetchingRight → ProbeEnd → EmitLeftUnmatched`: - Adds a dedicated `ProbeEnd` state, entered exactly once per left chunk when the right side is exhausted. It owns the single `report_probe_completed()` call and records whether this stream is the unmatched-left emitter. - Replaces the `probe_completed_reported` guard flag with an `is_unmatched_left_emitter` field that `EmitLeftUnmatched` only reads. - Removes the per-chunk flag reset in the memory-limited path (the decision is recomputed in `ProbeEnd` for each chunk) and reverts the `Arc::clone` workaround #22791 needed in `process_left_unmatched`. - Updates the state-transition doc graph and arm comments. No behavior change is expected. ## Are these changes tested? Yes — covered by existing tests: - All 42 `nested_loop_join` unit tests and the full `datafusion-physical-plan` suite pass. - `joins.slt` sqllogictests pass (including the multi-partition LEFT JOIN regression test added in #22791). - 41 `join_fuzz` tests (`cargo test --features extended_tests`) comparing `NestedLoopJoinExec` against `HashJoinExec` across every join type, filtered and unfiltered, with a multi-partition probe side — the exact scenario class of the original bug — pass. - `cargo fmt` and `cargo clippy --all-targets --all-features -- -D warnings` are clean. ## Are there any user-facing changes? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
