tustvold commented on issue #7251:
URL: https://github.com/apache/arrow-rs/issues/7251#issuecomment-2708211530

   Thank you for starting this discussion, I think we should definitely provide 
more utilities/primitives in this space.
   
   > The 
[ThrottledStore](https://docs.rs/object_store/latest/object_store/throttle/struct.ThrottledStore.html)
 and 
[LimitStore](https://docs.rs/object_store/latest/object_store/limit/struct.LimitStore.html)
 provided with the object store crate
   
   FWIW these should probably be deprecated and re-implemented at the 
HttpClient level.
   
   > Collect statistics / traces and report metrics (see 
[ObjectStoreMetrics](https://github.com/influxdata/influxdb3_core/tree/main/object_store_metrics)
 in influxdb3_core)
   > Runs on a different tokio runtime (such as the 
[DeltaIOStorageBackend](https://github.com/delta-io/delta-rs/blob/e30ab7e366eb209718c87acb6974a815503181bc/crates/core/src/storage/mod.rs#L116-L120)
 in delta rs from @ion-elgreco.
   > Collect statistics / traces and report metrics (see 
[ObjectStoreMetrics](https://github.com/influxdata/influxdb3_core/tree/main/object_store_metrics)
 in influxdb3_core)
   > Visualization of object store requests over time
   
   Now we have the HttpClient abstraction, I think this is the level I would 
encourage implementing most of these.
   
   > Limit the total size of any individual request (e.g. the 
LimitedRequestSizeObjectStore from
   https://github.com/apache/datafusion/issues/15067)
   > Break single large requests into multiple concurrent small requests 
("chunking") - @crepererum is working on this I think in influx
   > Limit the total size of any individual request (e.g. the 
LimitedRequestSizeObjectStore from 
https://github.com/apache/datafusion/issues/15067)
   
   This feels like something better built into some sort of TransferManager 
that sits on top of the ObjectStore API, as opposed to baking it in at the 
ObjectStore level. Perhaps in a similar vein to 
[BufWriter](https://docs.rs/object_store/latest/object_store/buffered/struct.BufWriter.html).
   
   This would, for example, allow registering a single ObjectStore, but then 
having different IO configurations for different areas of the stack. It would 
also potentially allow for greater concurrency, as the ObjectStore API has no 
mechanism by which chunks fetched in parallel could be returned out of order. 
This would be especially useful when downloading files to disk, as it avoids 
needing to hold chunks in memory unnecessarily.
   
   See #5277 for some prior discussion.
   
   > Add additional policies to provided implementations
   
   FWIW all the first-party implementations share a lot of the same underlying 
logic, e.g. with things like GetClient, and so it may actually not be all that 
bad
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to