tustvold commented on issue #5458:
URL: https://github.com/apache/arrow-rs/issues/5458#issuecomment-1978128759

   > Current design it requires users to call flush when the buffer gets too 
large because we can't perform IO during write or put. This complicates things 
as users can no longer write continuously like before.
   
   In practice they have to do this anyway because of 
https://github.com/tokio-rs/tokio/issues/4296 and 
https://github.com/apache/arrow-rs/issues/5366. In general the previous API was 
very difficult to actually use correctly, especially with the type of 
long-running synchronous operations that characterize arrow/parquet workloads.
   
   > Also, I'm unclear about how max_concurrency functions. Does this mean that 
flush could operate asynchronously in the background?
   
   The idea is if Upload has accumulated enough data to do so, it could upload 
multiple chunks in parallel, much like WriteMultipart does currently. 
   
   Effectively aside from moving away from the problematic AsyncWrite 
abstraction this doesn't materially alter the way IO is performed, other than 
removing the async backpressure mechanism that makes AsyncWrite such a pain to 
integrate with predominantly sync workloads


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to