clbarnes commented on PR #5281:
URL: https://github.com/apache/arrow-rs/pull/5281#issuecomment-1878504210

   > 2\. found the Rust implementation here is relatively slow
   
   This could be in the file writing step; I believe python does some buffering 
under the hood where rust blocks until the bytes are actually on the file 
system. Wrap the `File` in a `BufWriter` that might help.
   
   It's also worth noting that the `upload` and `download` functions both run 
in serial (albeit concurrently with other async functions): you wait to read 
bytes from the file, then you wait to upload them, then you wait to read the 
next bytes, wait to upload them etc.. I suspect it's possible to have a thread 
pre-loading a queue with chunks read from the file (which you can limit in size 
to prevent RAM explosion for large files), and another thread reading from that 
queue to handle the upload (or vice versa, downloading chunks from the store 
and writing it) so that you can do your web IO and local IO at the same time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to