Xuanwo opened a new issue, #14854: URL: https://github.com/apache/datafusion/issues/14854
Hello everyone, I'm jumping here from [[Discussion] Object Store Composition](https://github.com/apache/arrow-rs/issues/7171). ## Background Datafusion is using `ObjectStore` as it's public storage interface for now. We have public API like [`register_object_store`](https://docs.rs/datafusion/45.0.0/datafusion/execution/context/struct.SessionContext.html#method.register_object_store): ```rust let object_store_url = ObjectStoreUrl::parse("file://").unwrap(); let object_store = object_store::local::LocalFileSystem::new(); let ctx = SessionContext::new(); // All files with the file:// url prefix will be read from the local file system ctx.register_object_store(object_store_url.as_ref(), Arc::new(object_store)); ``` With the growth of DF, we have to continuously add more features to `object_store`, making it increasingly difficult to compose, as described in [[Discussion] Object Store Composition](https://github.com/apache/arrow-rs/issues/7171). The latest example is [adding Extensions to object store GetOptions](https://github.com/apache/arrow-rs/issues/7155) to allow passing tracing spans within the object store, as requested in [Improve use of tracing spans in query path](https://github.com/influxdata/influxdb/issues/25911). It's easy to predict that `ObjectStore` will move further and further away from its initial position: > Initially the ObjectStore API was relatively simple, consisting of a few methods to interact with object stores. As such many systems took this abstraction and used it as a generic IO abstraction, this is good and what the crate was designed for. ## Proposal So I proposse to build `datafusion-storage` primarily focused on DataFusion's own needs while maintaining `datafusion-storage-object-store` and `datafusion-storage-opendal` separately. The benefit is that users can implement innovative features like `datafusion-storage-cudf` or `datafusion-storage-io_uring` without being constrained by the current I/O abstraction of object-store or OpenDAL. If this becomes a reality, DataFusion can design the abstraction based on its own requirements without having to push everything upstream to `object_store`. This would allow them to maintain useful features such as context management and add additional requirements to the trait while letting `datafusion-storage-object-store` and `datafusion-storage-opendal` handle the extra work. ## Implematation We can start by aliasing the `ObjectStore` trait within `datafusion-storage` first. Given sufficient migration time, we can then fine-tune the trait to better align with DF's specific needs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org