hareshkh opened a new issue, #20683:
URL: https://github.com/apache/datafusion/issues/20683

   ### Describe the bug
   
   In non-preserve-order repartitioning mode, all input partition tasks share 
clones of the same `SpillPoolWriter` for each output partition. 
`SpillPoolWriter` used `#[derive(Clone)]` but its `Drop` implementation 
unconditionally set `writer_dropped = true` and finalized the current spill 
file. This meant that when the first input task finishes and its clone is 
dropped, the SpillPoolReader sees `writer_dropped = true` on an empty queue and 
returns EOF — silently discarding every batch subsequently written by the 
still-running input tasks.
   
   This bug requires three conditions to trigger:
   1. Non-preserve-order repartitioning (so spill writers are cloned across 
input tasks)
   2. Memory pressure causing batches to spill to disk
   3. Input tasks finishing at different times (the common case with varying 
partition sizes)
   
   
   ### To Reproduce
   
   _No response_
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to