hareshkh opened a new issue, #20683: URL: https://github.com/apache/datafusion/issues/20683
### Describe the bug In non-preserve-order repartitioning mode, all input partition tasks share clones of the same `SpillPoolWriter` for each output partition. `SpillPoolWriter` used `#[derive(Clone)]` but its `Drop` implementation unconditionally set `writer_dropped = true` and finalized the current spill file. This meant that when the first input task finishes and its clone is dropped, the SpillPoolReader sees `writer_dropped = true` on an empty queue and returns EOF — silently discarding every batch subsequently written by the still-running input tasks. This bug requires three conditions to trigger: 1. Non-preserve-order repartitioning (so spill writers are cloned across input tasks) 2. Memory pressure causing batches to spill to disk 3. Input tasks finishing at different times (the common case with varying partition sizes) ### To Reproduce _No response_ ### Expected behavior _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
