andygrove commented on PR #1389: URL: https://github.com/apache/datafusion-ballista/pull/1389#issuecomment-3765556652
> Thanks @andygrove will try to have a look later. > > One question, I'm not sure if it makes sense. > > * What if instead spilling to temporary file spill goes to output file directly, and index to keep more than one partition id -> offset mapping. > * Read would need to do few more file seeks, as batches for same partition are scattered around, but should not be too bad as reads should be able to read many batches together (as batches are buffered before write). This would save spill batch reconciliation at the end. That's an interesting idea. I will experiment with that in a separate PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
