Re: [PR] feat: Add sort-based shuffle implementation [datafusion-ballista]

via GitHub Sun, 18 Jan 2026 09:57:21 -0800


milenkovicm commented on PR #1389:
URL: 
https://github.com/apache/datafusion-ballista/pull/1389#issuecomment-3765548374


   Thanks @andygrove will try to have a look later. 
   
   One question, I'm not sure if it makes sense. 
   
   - What if instead spilling to temporary file spill goes to output file 
directly, and index to keep more than one partition id -> offset mapping. 
   - Read would need to do few more file seeks, as batches for same partition 
are scattered around, but should not be too bad as reads should be able to read 
many batches together (as batches are buffered before write). This would save 
spill batch reconciliation at the end. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Add sort-based shuffle implementation [datafusion-ballista]

Reply via email to