crepererum commented on issue #7171: URL: https://github.com/apache/arrow-rs/issues/7171#issuecomment-2677992820
# Thoughts ## Status Quo at Influx > As people wanted additional functionality, such as instrumentation, caching or concurrency limiting, this was implemented by creating ObjectStore implementations that wrap the existing ones. Again this worked well. (from https://github.com/apache/arrow-rs/issues/7171#issue-2868833643 ) I've just looked at our setup and this is how the hierarchy currently looks like (from outer to inner): 1. metrics 2. in-memory cache 3. metrics 4. disk cache 5. metrics 6. chunking of GET requests (that's also where #7155 comes in handy) 7. metrics 8. racing of multiple requests 9. metrics 10. actual `object_store` AWS S3 implementation Not saying that this is what people usually do, but I thought that insight might be helpful. ## Trait methods > However, over time the ObjectStore API has grown, and now has 8 required methods and a further 10 methods with default implementations. (from https://github.com/apache/arrow-rs/issues/7171#issue-2868833643 ) I think that by itself was a design mistake (in hindsight). There should be ONE single GET method (probably `get_opts`) and if that method is too complicated for API users, we should provide them with an extension trait (NOT part of the core trait) that maps simple API methods to the more complete ones. > As a wrapper must avoid "despecializing" methods, it must implement all 18 methods. Not only is this burdensome, but creates upgrade hazards as new methods are added, potentially in non-breaking versions. (from https://github.com/apache/arrow-rs/issues/7171#issue-2868833643 ) At Influx we use [`#[deny(clippy::missing_trait_methods)]`](https://rust-lang.github.io/rust-clippy/master/index.html#missing_trait_methods) to catch that, but you're right that adding new methods is therefore a perceived breaking change. ## Sans-IO / IO Abstraction > Add additional functionality for injecting logic into the HTTP request path (#6056) allowing > More accurate instrumentation (from https://github.com/apache/arrow-rs/issues/7171#issue-2868833643 ) Depends. I would say "more low-level", but now you're in a different pickle because you now longer see the high-level methods or at least you partially have to recover them (e.g. is it an upload/put, a copy, a move). So I don't think the instrumentation on that level is universally better. I do however agree that splitting the IO layer is the right move. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org