Re: [PR] feat: Add sort-based shuffle implementation [datafusion-ballista]

via GitHub Sun, 18 Jan 2026 10:05:12 -0800


andygrove commented on PR #1389:
URL: 
https://github.com/apache/datafusion-ballista/pull/1389#issuecomment-3765556652


   > Thanks @andygrove will try to have a look later.
   > 
   > One question, I'm not sure if it makes sense.
   > 
   > * What if instead spilling to temporary file spill goes to output file 
directly, and index to keep more than one partition id -> offset mapping.
   > * Read would need to do few more file seeks, as batches for same partition 
are scattered around, but should not be too bad as reads should be able to read 
many batches together (as batches are buffered before write). This would save 
spill batch reconciliation at the end.
   
   That's an interesting idea. I will experiment with that in a separate PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Add sort-based shuffle implementation [datafusion-ballista]

Reply via email to