tustvold commented on issue #5458: URL: https://github.com/apache/arrow-rs/issues/5458#issuecomment-1978128759
> Current design it requires users to call flush when the buffer gets too large because we can't perform IO during write or put. This complicates things as users can no longer write continuously like before. In practice they have to do this anyway because of https://github.com/tokio-rs/tokio/issues/4296 and https://github.com/apache/arrow-rs/issues/5366. In general the previous API was very difficult to actually use correctly, especially with the type of long-running synchronous operations that characterize arrow/parquet workloads. > Also, I'm unclear about how max_concurrency functions. Does this mean that flush could operate asynchronously in the background? The idea is if Upload has accumulated enough data to do so, it could upload multiple chunks in parallel, much like WriteMultipart does currently. Effectively aside from moving away from the problematic AsyncWrite abstraction this doesn't materially alter the way IO is performed, other than removing the async backpressure mechanism that makes AsyncWrite such a pain to integrate with predominantly sync workloads -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
