nathanb9 opened a new pull request, #22865:
URL: https://github.com/apache/datafusion/pull/22865

   ## Which issue does this PR close?
   
   - Closes #22808.
   
   ## Rationale for this change
   
   Follow-up to #22791, as suggested in review by @2010YOUY01.
   
   That PR fixed a double-decrement bug where `EmitLeftUnmatched` did two jobs 
at once — deciding whether a partition emits unmatched-left rows (which 
decrements the shared `probe_threads_counter`) and performing the emit. Because 
the state is re-enterable (a ready batch can be flushed before the state 
advances to `Done`), the counter could be decremented twice, driving it to zero 
before all partitions finished probing and emitting spurious NULL-padded rows. 
#22791 patched this with a `probe_completed_reported` guard flag.
   
   This refactor makes "decrement exactly once per probe stream" a structural 
property of the state graph rather than a runtime guard, so the inner logic is 
easier to follow and the bug is harder to reintroduce.
   
   ## What changes are included in this PR?
   
   Restructures the state machine from `FetchingRight → EmitLeftUnmatched` to 
`FetchingRight → ProbeEnd → EmitLeftUnmatched`:
   
   - Adds a dedicated `ProbeEnd` state, entered exactly once per left chunk 
when the right side is exhausted. It owns the single `report_probe_completed()` 
call and records whether this stream is the unmatched-left emitter.
   - Replaces the `probe_completed_reported` guard flag with an 
`is_unmatched_left_emitter` field that `EmitLeftUnmatched` only reads.
   - Removes the per-chunk flag reset in the memory-limited path (the decision 
is recomputed in `ProbeEnd` for each chunk) and reverts the `Arc::clone` 
workaround #22791 needed in `process_left_unmatched`.
   - Updates the state-transition doc graph and arm comments.
   
   No behavior change is expected.
   
   ## Are these changes tested?
   
   Yes — covered by existing tests:
   
   - All 42 `nested_loop_join` unit tests and the full 
`datafusion-physical-plan` suite pass.
   - `joins.slt` sqllogictests pass (including the multi-partition LEFT JOIN 
regression test added in #22791).
   - 41 `join_fuzz` tests (`cargo test --features extended_tests`) comparing 
`NestedLoopJoinExec` against `HashJoinExec` across every join type, filtered 
and unfiltered, with a multi-partition probe side — the exact scenario class of 
the original bug — pass.
   - `cargo fmt` and `cargo clippy --all-targets --all-features -- -D warnings` 
are clean.
   
   ## Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to