Re: [I] [Discussion] Object Store Composition [arrow-rs]

via GitHub Mon, 24 Feb 2025 02:27:02 -0800


crepererum commented on issue #7171:
URL: https://github.com/apache/arrow-rs/issues/7171#issuecomment-2677992820

# Thoughts

## Status Quo at Influx
> As people wanted additional functionality, such as instrumentation,
caching or concurrency limiting, this was implemented by creating ObjectStore
implementations that wrap the existing ones. Again this worked well.

(from https://github.com/apache/arrow-rs/issues/7171#issue-2868833643 )

I've just looked at our setup and this is how the hierarchy currently looks
like (from outer to inner):

1. metrics
2. in-memory cache
3. metrics
4. disk cache
5. metrics
6. chunking of GET requests (that's also where #7155 comes in handy)
7. metrics
8. racing of multiple requests
9. metrics
10. actual `object_store` AWS S3 implementation

Not saying that this is what people usually do, but I thought that insight
might be helpful.

## Trait methods

> However, over time the ObjectStore API has grown, and now has 8 required
methods and a further 10 methods with default implementations.

(from https://github.com/apache/arrow-rs/issues/7171#issue-2868833643 )

I think that by itself was a design mistake (in hindsight). There should be
ONE single GET method (probably `get_opts`) and if that method is too
complicated for API users, we should provide them with an extension trait (NOT
part of the core trait) that maps simple API methods to the more complete ones.

> As a wrapper must avoid "despecializing" methods, it must implement all 18
methods. Not only is this burdensome, but creates upgrade hazards as new
methods are added, potentially in non-breaking versions.

(from https://github.com/apache/arrow-rs/issues/7171#issue-2868833643 )

At Influx we use
[`#[deny(clippy::missing_trait_methods)]`](https://rust-lang.github.io/rust-clippy/master/index.html#missing_trait_methods)
to catch that, but you're right that adding new methods is therefore a
perceived breaking change.

## Sans-IO / IO Abstraction
> Add additional functionality for injecting logic into the HTTP request
path (#6056) allowing
> More accurate instrumentation

(from https://github.com/apache/arrow-rs/issues/7171#issue-2868833643 )

Depends. I would say "more low-level", but now you're in a different pickle
because you now longer see the high-level methods or at least you partially
have to recover them (e.g. is it an upload/put, a copy, a move). So I don't
think the instrumentation on that level is universally better.

I do however agree that splitting the IO layer is the right move.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [Discussion] Object Store Composition [arrow-rs]

Reply via email to