alamb commented on issue #7171: URL: https://github.com/apache/arrow-rs/issues/7171#issuecomment-2676833199
> However, over time the ObjectStore API has grown, and now has 8 required methods and a further 10 methods with default implementations. This creates a number of challenges for this wrapper based approach for composition. I basically agree with this statement of challenge, though I am not sure how hard it actually is in practice (having done it myself and seen various different versions of it) > _**Interface Creep**_ > > In many places the ObjectStore interface gets used as the abstraction for components that don't actually require the full breadth of ObjectStore functionality. There is no need, for example, for a parquet reader to depend on more than the ability to fetch ranges of bytes. I think the parquet reader also needs to be able to get the total file sizes as well (at least unless the negative ranges to fetch end bytes is supported). Adding support to > This leads to perverse "ObjectStore" implementations, that actually only implement say get functionality. Similarly in contexts like [apache/datafusion#14286](https://github.com/apache/datafusion/pull/14286) it creates complexities around how to shim the full ObjectStore interface, despite the actual operators in question only using a very small subset of this functionality. > > I personally think we should encourage a move away from this wrapper based form of composition and instead do the following: > > * Encourage use of specialized traits like parquet's [AsyncFileReader](https://docs.rs/parquet/latest/parquet/arrow/async_reader/trait.AsyncFileReader.html) that reflect what a given component actually needs, and can evolve independently of ObjectStore I would say "perverse" is somewhat subjective. It is certainly complex but that also needs to be measured against 1. The complexity of other alternatives 2. The complexity of the problem being solved 3. The complexity of using the API vs implementing the API. For example, the `AsyncFileReader` in parquet specifically adds non trivial complexity to using the reader > * Add additional functionality for injecting logic into the HTTP request path ([Decouple ObjectStore from Reqwest / Generic HTTP Client SupportĀ #6056](https://github.com/apache/arrow-rs/issues/6056)) allowing > > * More accurate instrumentation > * More accurate concurrency limiting > * Potential sophistication w.r.t tokio runtime dispatch This feels like a good idea to me > > I can't help feeling right now ObjectStore is stuck between trying to expose the functionality of ObjectStore's in a portable and ergonomic fashion, whilst also trying to provide some sort of generic all-purpose IO subsystem abstraction, which I'm not sure aren't incompatible goals.... I think OpenDAL https://github.com/apache/opendal is trying to provide generic all-purpose IO subsystem abstraction So my personal recommendation is 1. Keep the object store API the same / don't expand it 2. Support https://github.com/apache/arrow-rs/issues/6056 3. Add additional documentation / examples for more advanced functionality 4. Point people at OpenDAL if they need more advanced features -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org