xanderbailey opened a new pull request, #20672: URL: https://github.com/apache/datafusion/pull/20672
## Which issue does this PR close? - Closes #. ## Rationale for this change In non-preserve-order repartitioning mode, all input partition tasks share clones of the same `SpillPoolWriter` for each output partition. `SpillPoolWriter` used `#[derive(Clone)]` but its `Drop` implementation unconditionally set `writer_dropped = true` and finalized the current spill file. This meant that when the **first** input task finished and its clone was dropped, the `SpillPoolReader` would see `writer_dropped = true` on an empty queue and return EOF — silently discarding every batch subsequently written by the still-running input tasks. This bug requires three conditions to trigger: 1. Non-preserve-order repartitioning (so spill writers are cloned across input tasks) 2. Memory pressure causing batches to spill to disk 3. Input tasks finishing at different times (the common case with varying partition sizes) ## What changes are included in this PR? **`datafusion/physical-plan/src/spill/spill_pool.rs`:** - Added `active_writer_count: usize` to `SpillPoolShared` to track the number of live writer clones. - Replaced `#[derive(Clone)]` on `SpillPoolWriter` with a manual `Clone` impl that increments `active_writer_count` under the shared lock. - Updated `Drop` to decrement `active_writer_count` and only finalize the current file / set `writer_dropped = true` when the count reaches zero (i.e. the last clone is dropped). Non-last clones now return immediately from `Drop`. - Added regression test `test_clone_drop_does_not_signal_eof_prematurely` that reproduces the exact failure: writer1 writes and drops, the reader drains the queue, then writer2 (still alive) writes. Without the fix the reader returns premature EOF and the assertion fails; with the fix the reader waits and reads both batches. ## Are these changes tested? Yes. A new unit test (`test_clone_drop_does_not_signal_eof_prematurely`) directly reproduces the bug. It was verified to **fail without the fix** and **pass with the fix**. All 41 existing repartition tests and all 15 existing spill pool tests continue to pass. ## Are there any user-facing changes? No API changes. Queries that repartition under memory pressure will no longer silently lose rows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
