kosiew opened a new pull request, #22623:
URL: https://github.com/apache/datafusion/pull/22623

   ### Which issue does this PR close?
   
   * Closes #22622.
   
   ### Rationale for this change
   
   The build-report lifecycle for hash join partitions was previously spread 
across `HashJoinStream`, `OnceFut` handling, and drop-time cancellation logic. 
Although correctness around scheduled vs. delivered reports had already been 
addressed, the lifecycle responsibilities remained fragmented, making the code 
harder to reason about and increasing the risk of regressions.
   
   This change centralizes lifecycle ownership in a dedicated abstraction that 
encodes state transitions and terminal outcomes explicitly, making the behavior 
more deterministic and easier to maintain.
   
   ### What changes are included in this PR?
   
   * Introduce a new `BuildReportHandle` type to own the lifecycle of a 
partition's build-data report.
   * Replace stream-level lifecycle tracking (`build_waiter` and 
`build_report_state`) with `BuildReportHandle`.
   * Consolidate report lifecycle transitions into explicit methods:
   
     * `schedule`
     * `wait_for_delivery`
     * `cancel_if_pending`
     * `finalize`
   * Expand lifecycle state tracking to:
   
     * `NotReported`
     * `Scheduled`
     * `Delivered`
     * `Canceled`
     * `Finalized`
   * Move drop-time cancellation behavior into `BuildReportHandle::Drop`, 
ensuring pending partition reports are handled consistently.
   * Simplify `HashJoinStream` by delegating build-report lifecycle decisions 
to the new handle.
   * Extract reusable test helpers from `shared_bounds.rs` for constructing and 
inspecting partitioned accumulators in tests.
   
   ### Are these changes tested?
   
   Yes.
   
   Added tests covering the new lifecycle handle behavior:
   
   * `build_report_handle_cancels_scheduled_partition_on_drop`
   * `build_report_handle_does_not_cancel_delivered_partition_on_drop`
   * `build_report_handle_cancel_if_pending_is_idempotent`
   
   Existing shared-bounds tests were also updated to use the extracted test 
helpers:
   
   * `report_canceled_partition_is_noop_after_report`
   * `report_canceled_partition_marks_pending_partition_canceled`
   
   
   
   ### Are there any user-facing changes?
   
   No. This is an internal refactoring and maintainability improvement for hash 
join build-report lifecycle management. No user-facing behavior or public APIs 
are changed.
   
   ### LLM-generated code disclosure
   
   This PR includes LLM-generated code and comments. All LLM-generated content 
has been manually reviewed and tested.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to