xanderbailey opened a new pull request, #20672:
URL: https://github.com/apache/datafusion/pull/20672

   ## Which issue does this PR close?
   
   - Closes #.
   
   ## Rationale for this change
   
   In non-preserve-order repartitioning mode, all input partition tasks share 
clones of the same `SpillPoolWriter` for each output partition. 
`SpillPoolWriter` used `#[derive(Clone)]` but its `Drop` implementation 
unconditionally set `writer_dropped = true` and finalized the current spill 
file. This meant that when the **first** input task finished and its clone was 
dropped, the `SpillPoolReader` would see `writer_dropped = true` on an empty 
queue and return EOF — silently discarding every batch subsequently written by 
the still-running input tasks.
   
   This bug requires three conditions to trigger:
   1. Non-preserve-order repartitioning (so spill writers are cloned across 
input tasks)
   2. Memory pressure causing batches to spill to disk
   3. Input tasks finishing at different times (the common case with varying 
partition sizes)
   
   ## What changes are included in this PR?
   
   **`datafusion/physical-plan/src/spill/spill_pool.rs`:**
   
   - Added `active_writer_count: usize` to `SpillPoolShared` to track the 
number of live writer clones.
   - Replaced `#[derive(Clone)]` on `SpillPoolWriter` with a manual `Clone` 
impl that increments `active_writer_count` under the shared lock.
   - Updated `Drop` to decrement `active_writer_count` and only finalize the 
current file / set `writer_dropped = true` when the count reaches zero (i.e. 
the last clone is dropped). Non-last clones now return immediately from `Drop`.
   - Added regression test `test_clone_drop_does_not_signal_eof_prematurely` 
that reproduces the exact failure: writer1 writes and drops, the reader drains 
the queue, then writer2 (still alive) writes. Without the fix the reader 
returns premature EOF and the assertion fails; with the fix the reader waits 
and reads both batches.
   
   ## Are these changes tested?
   
   Yes. A new unit test (`test_clone_drop_does_not_signal_eof_prematurely`) 
directly reproduces the bug. It was verified to **fail without the fix** and 
**pass with the fix**. All 41 existing repartition tests and all 15 existing 
spill pool tests continue to pass.
   
   ## Are there any user-facing changes?
   
   No API changes. Queries that repartition under memory pressure will no 
longer silently lose rows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to