crepererum commented on issue #7171:
URL: https://github.com/apache/arrow-rs/issues/7171#issuecomment-2677992820

   # Thoughts
   
   ## Status Quo at Influx
   > As people wanted additional functionality, such as instrumentation, 
caching or concurrency limiting, this was implemented by creating ObjectStore 
implementations that wrap the existing ones. Again this worked well.
   
   (from https://github.com/apache/arrow-rs/issues/7171#issue-2868833643 )
   
   I've just looked at our setup and this is how the hierarchy currently looks 
like (from outer to inner):
   
   1. metrics
   2. in-memory cache
   3. metrics
   4. disk cache
   5. metrics
   6. chunking of GET requests (that's also where #7155 comes in handy)
   7. metrics
   8. racing of multiple requests
   9. metrics
   10. actual `object_store` AWS S3 implementation
   
   Not saying that this is what people usually do, but I thought that insight 
might be helpful.
   
   ## Trait methods
   
   > However, over time the ObjectStore API has grown, and now has 8 required 
methods and a further 10 methods with default implementations.
   
   (from https://github.com/apache/arrow-rs/issues/7171#issue-2868833643 )
   
   I think that by itself was a design mistake (in hindsight). There should be 
ONE single GET method (probably `get_opts`) and if that method is too 
complicated for API users, we should provide them with an extension trait (NOT 
part of the core trait) that maps simple API methods to the more complete ones.
   
   > As a wrapper must avoid "despecializing" methods, it must implement all 18 
methods. Not only is this burdensome, but creates upgrade hazards as new 
methods are added, potentially in non-breaking versions.
   
   (from https://github.com/apache/arrow-rs/issues/7171#issue-2868833643 )
   
   At Influx we use 
[`#[deny(clippy::missing_trait_methods)]`](https://rust-lang.github.io/rust-clippy/master/index.html#missing_trait_methods)
 to catch that, but you're right that adding new methods is therefore a 
perceived breaking change.
   
   ## Sans-IO / IO Abstraction
   > Add additional functionality for injecting logic into the HTTP request 
path (#6056) allowing
   > More accurate instrumentation
   
   (from https://github.com/apache/arrow-rs/issues/7171#issue-2868833643 )
   
   Depends. I would say "more low-level", but now you're in a different pickle 
because you now longer see the high-level methods or at least you partially 
have to recover them (e.g. is it an upload/put, a copy, a move). So I don't 
think the instrumentation on that level is universally better.
   
   I do however agree that splitting the IO layer is the right move.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to