kosiew opened a new pull request, #22623:
URL: https://github.com/apache/datafusion/pull/22623
### Which issue does this PR close?
* Closes #22622.
### Rationale for this change
The build-report lifecycle for hash join partitions was previously spread
across `HashJoinStream`, `OnceFut` handling, and drop-time cancellation logic.
Although correctness around scheduled vs. delivered reports had already been
addressed, the lifecycle responsibilities remained fragmented, making the code
harder to reason about and increasing the risk of regressions.
This change centralizes lifecycle ownership in a dedicated abstraction that
encodes state transitions and terminal outcomes explicitly, making the behavior
more deterministic and easier to maintain.
### What changes are included in this PR?
* Introduce a new `BuildReportHandle` type to own the lifecycle of a
partition's build-data report.
* Replace stream-level lifecycle tracking (`build_waiter` and
`build_report_state`) with `BuildReportHandle`.
* Consolidate report lifecycle transitions into explicit methods:
* `schedule`
* `wait_for_delivery`
* `cancel_if_pending`
* `finalize`
* Expand lifecycle state tracking to:
* `NotReported`
* `Scheduled`
* `Delivered`
* `Canceled`
* `Finalized`
* Move drop-time cancellation behavior into `BuildReportHandle::Drop`,
ensuring pending partition reports are handled consistently.
* Simplify `HashJoinStream` by delegating build-report lifecycle decisions
to the new handle.
* Extract reusable test helpers from `shared_bounds.rs` for constructing and
inspecting partitioned accumulators in tests.
### Are these changes tested?
Yes.
Added tests covering the new lifecycle handle behavior:
* `build_report_handle_cancels_scheduled_partition_on_drop`
* `build_report_handle_does_not_cancel_delivered_partition_on_drop`
* `build_report_handle_cancel_if_pending_is_idempotent`
Existing shared-bounds tests were also updated to use the extracted test
helpers:
* `report_canceled_partition_is_noop_after_report`
* `report_canceled_partition_marks_pending_partition_canceled`
### Are there any user-facing changes?
No. This is an internal refactoring and maintainability improvement for hash
join build-report lifecycle management. No user-facing behavior or public APIs
are changed.
### LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content
has been manually reviewed and tested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]