Xuanwo opened a new pull request, #15018:
URL: https://github.com/apache/datafusion/pull/15018

   ## Which issue does this PR close?
   
   PoC for https://github.com/apache/datafusion/issues/14854
   
   ## Rationale for this change
   
   This PR is basiclly a PoC for the `datafusion-storage` and all API are 
available just for discussion.
   
   ### Design
   
   The DataFusion storage is designed based on current DataFusion usage, where 
we only use `read`, `write`, `stat`, and `list` for now.
   
   - We have a `Storage` trait that implementations must follow, allowing each 
implementer to optimize within `StorageFileRead`, `StorageFileWrite`, and other 
related components.
   - We introduced a `StorageExt` trait primarily for DataFusion users, 
providing helper functions to optimize storage usage. These functions help 
users avoid unnecessary memory copies or clones, ensuring more efficient 
performance.
   - For every operation, we will pass down a `StorageContext` to carry context 
such as metrics, tracing spans, HTTP clients, or runtime. DataFusion users can 
implement or inject them as needed.
   - For each returned struct, such as `StorageFileReader`, we implement 
adapters for `futures::AsyncWrite` and `futures::Stream` to optimize our API 
and avoid unnecessary costs. 
   - We intentionally hide storage-specific features like multipart uploads, 
leaving them as implementation details for the `datafusion-storage` implementer.
   
   ### More details
   
   `datafusion-storage` can serve as the default storage entry point for our 
execution, but we also provide traits like the Parquet `AsyncFileReader`, 
allowing users to integrate their own storage solutions.
   
   ## What changes are included in this PR?
   
   Added a new crate called `datafusion-storage`, which serves as the storage 
abstraction for the entire DataFusion ecosystem.
   
   ## Are these changes tested?
   
   Not yet.
   
   ## Are there any user-facing changes?
   
   Yes, we will provide more details later as we make progress with the actual 
work.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to