pantShrey commented on issue #23247: URL: https://github.com/apache/datafusion/issues/23247#issuecomment-4842630354
One constraint worth flagging: Arrow's StreamWriter requires a `std::io::Write` implementor, so the synchronous boundary isn't just in DataFusion's `SpillWriter` -- it's baked into the Arrow IPC writer itself. Any async adapter built on top of the current Arrow API would need to either buffer batches before handing them to an async upload, or chunk synchronous writes into an async stream, both of which reintroduce a copy that defeats part of the purpose. A cleaner fix would likely require Arrow itself to expose an async IPC writer (writing against `AsyncWrite` or similar), rather than DataFusion working around the sync constraint from the outside. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
