Xuanwo opened a new issue, #14854:
URL: https://github.com/apache/datafusion/issues/14854

   Hello everyone, I'm jumping here from [[Discussion] Object Store 
Composition](https://github.com/apache/arrow-rs/issues/7171). 
   
   ## Background
   
   Datafusion is using `ObjectStore` as it's public storage interface for now. 
We have public API like 
[`register_object_store`](https://docs.rs/datafusion/45.0.0/datafusion/execution/context/struct.SessionContext.html#method.register_object_store):
   
   ```rust
   let object_store_url = ObjectStoreUrl::parse("file://").unwrap();
   let object_store = object_store::local::LocalFileSystem::new();
   let ctx = SessionContext::new();
   // All files with the file:// url prefix will be read from the local file 
system
   ctx.register_object_store(object_store_url.as_ref(), Arc::new(object_store));
   ```
   
   With the growth of DF, we have to continuously add more features to 
`object_store`, making it increasingly difficult to compose, as described in 
[[Discussion] Object Store 
Composition](https://github.com/apache/arrow-rs/issues/7171).
   
   The latest example is [adding Extensions to object store 
GetOptions](https://github.com/apache/arrow-rs/issues/7155) to allow passing 
tracing spans within the object store, as requested in [Improve use of tracing 
spans in query path](https://github.com/influxdata/influxdb/issues/25911).
   
   It's easy to predict that `ObjectStore` will move further and further away 
from its initial position:
   
   > Initially the ObjectStore API was relatively simple, consisting of a few 
methods to interact with object stores. As such many systems took this 
abstraction and used it as a generic IO abstraction, this is good and what the 
crate was designed for.
   
   ## Proposal
   
   So I proposse to build `datafusion-storage` primarily focused on 
DataFusion's own needs while maintaining `datafusion-storage-object-store` and 
`datafusion-storage-opendal` separately. The benefit is that users can 
implement innovative features like `datafusion-storage-cudf` or 
`datafusion-storage-io_uring` without being constrained by the current I/O 
abstraction of object-store or OpenDAL.
   
   If this becomes a reality, DataFusion can design the abstraction based on 
its own requirements without having to push everything upstream to 
`object_store`. This would allow them to maintain useful features such as 
context management and add additional requirements to the trait while letting 
`datafusion-storage-object-store` and `datafusion-storage-opendal` handle the 
extra work.
   
   ## Implematation
   
   We can start by aliasing the `ObjectStore` trait within `datafusion-storage` 
first. Given sufficient migration time, we can then fine-tune the trait to 
better align with DF's specific needs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to