Xuanwo opened a new pull request, #15018: URL: https://github.com/apache/datafusion/pull/15018
## Which issue does this PR close? PoC for https://github.com/apache/datafusion/issues/14854 ## Rationale for this change This PR is basiclly a PoC for the `datafusion-storage` and all API are available just for discussion. ### Design The DataFusion storage is designed based on current DataFusion usage, where we only use `read`, `write`, `stat`, and `list` for now. - We have a `Storage` trait that implementations must follow, allowing each implementer to optimize within `StorageFileRead`, `StorageFileWrite`, and other related components. - We introduced a `StorageExt` trait primarily for DataFusion users, providing helper functions to optimize storage usage. These functions help users avoid unnecessary memory copies or clones, ensuring more efficient performance. - For every operation, we will pass down a `StorageContext` to carry context such as metrics, tracing spans, HTTP clients, or runtime. DataFusion users can implement or inject them as needed. - For each returned struct, such as `StorageFileReader`, we implement adapters for `futures::AsyncWrite` and `futures::Stream` to optimize our API and avoid unnecessary costs. - We intentionally hide storage-specific features like multipart uploads, leaving them as implementation details for the `datafusion-storage` implementer. ### More details `datafusion-storage` can serve as the default storage entry point for our execution, but we also provide traits like the Parquet `AsyncFileReader`, allowing users to integrate their own storage solutions. ## What changes are included in this PR? Added a new crate called `datafusion-storage`, which serves as the storage abstraction for the entire DataFusion ecosystem. ## Are these changes tested? Not yet. ## Are there any user-facing changes? Yes, we will provide more details later as we make progress with the actual work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org