alamb commented on issue #7171:
URL: https://github.com/apache/arrow-rs/issues/7171#issuecomment-2676833199

   > However, over time the ObjectStore API has grown, and now has 8 required 
methods and a further 10 methods with default implementations. This creates a 
number of challenges for this wrapper based approach for composition.
   
   I basically agree with this statement of challenge, though I am not sure how 
hard it actually is in practice (having done it myself and seen various 
different versions of it)
   
   
   > _**Interface Creep**_
   > 
   > In many places the ObjectStore interface gets used as the abstraction for 
components that don't actually require the full breadth of ObjectStore 
functionality. There is no need, for example, for a parquet reader to depend on 
more than the ability to fetch ranges of bytes.
   
   I think the parquet reader also needs to be able to get the total file sizes 
as well (at least unless the negative ranges to fetch end bytes is supported). 
Adding support to 
   
   
   > This leads to perverse "ObjectStore" implementations, that actually only 
implement say get functionality. Similarly in contexts like 
[apache/datafusion#14286](https://github.com/apache/datafusion/pull/14286) it 
creates complexities around how to shim the full ObjectStore interface, despite 
the actual operators in question only using a very small subset of this 
functionality.
   >
   > I personally think we should encourage a move away from this wrapper based 
form of composition and instead do the following:
   > 
   > * Encourage use of specialized traits like parquet's 
[AsyncFileReader](https://docs.rs/parquet/latest/parquet/arrow/async_reader/trait.AsyncFileReader.html)
 that reflect what a given component actually needs, and can evolve 
independently of ObjectStore
   
   I would say "perverse" is somewhat subjective.  It is certainly complex but 
that also needs to be measured against 
   1. The complexity of other alternatives
   2. The complexity of the problem being solved
   3. The complexity of using the API vs implementing the API. For example, the 
`AsyncFileReader` in parquet specifically adds non trivial complexity to using 
the reader
   
   
   > * Add additional functionality for injecting logic into the HTTP request 
path ([Decouple ObjectStore from Reqwest / Generic HTTP Client SupportĀ 
#6056](https://github.com/apache/arrow-rs/issues/6056)) allowing
   >   
   >   * More accurate instrumentation
   >   * More accurate concurrency limiting
   >   * Potential sophistication w.r.t tokio runtime dispatch
   
   This feels like a good idea to me
   
   > 
   > I can't help feeling right now ObjectStore is stuck between trying to 
expose the functionality of ObjectStore's in a portable and ergonomic fashion, 
whilst also trying to provide some sort of generic all-purpose IO subsystem 
abstraction, which I'm not sure aren't incompatible goals....
   
   I think OpenDAL https://github.com/apache/opendal is trying to provide 
generic all-purpose IO subsystem abstraction
   
   So my personal recommendation is
   1. Keep the object store API the same / don't expand it
   2. Support https://github.com/apache/arrow-rs/issues/6056
   3. Add additional documentation / examples for more advanced functionality
   4. Point people at OpenDAL if they need more advanced features
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to