clbarnes commented on PR #5281: URL: https://github.com/apache/arrow-rs/pull/5281#issuecomment-1878504210
> 2\. found the Rust implementation here is relatively slow This could be in the file writing step; I believe python does some buffering under the hood where rust blocks until the bytes are actually on the file system. Wrap the `File` in a `BufWriter` that might help. It's also worth noting that the `upload` and `download` functions both run in serial (albeit concurrently with other async functions): you wait to read bytes from the file, then you wait to upload them, then you wait to read the next bytes, wait to upload them etc.. I suspect it's possible to have a thread pre-loading a queue with chunks read from the file (which you can limit in size to prevent RAM explosion for large files), and another thread reading from that queue to handle the upload (or vice versa, downloading chunks from the store and writing it) so that you can do your web IO and local IO at the same time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
