timsaucer opened a new pull request, #12920: URL: https://github.com/apache/datafusion/pull/12920
## Which issue does this PR close? This is to address part of https://github.com/apache/datafusion-python/issues/823 downstream but may have wider application than just python. ## Rationale for this change This PR allows for registering table providers via a stable FFI. With this change it enables breaking the requirement for python providers to include all of datafusion-python and re-export it. With this change we can allow providers with different underlying datafusion versions to interoperate. ## What changes are included in this PR? Adds support for `TableProvider` via FFI. In order to support this, it also includes `ExecutionPlan`, `SessionConfig`, `PlanProperties`, and `TableType`. As this gets used more, I expect we will want to expose other features but this gives an initial first implementation that solves an immediate need. ## Are these changes tested? Some unit tests are provided. Additionally I did the following test: I created a separate crate with the contents of `datafusion/ffi` so that I can test it against different versions of DataFusion by modifying the dependencies in Cargo.toml. Then I used this crate to build a test implementation of `datafusion-python` against DataFusion 42.0.0. I adjusted the test crate and built a test implementation of `delta-rs` against DataFusion 41.0.0. Then I registered the delta table in python against the session context. I was able to query the table with push down filters via this FFI interface even though the underlying DataFusion versions were different. Additionally I ran memory leak checks against the provided unit tests and against running in python. ## Are there any user-facing changes? This is not breaking, but a pure addition of a new `datafusion-ffi` library. ## Remaining Issues - [ ] There is some inconsistency between the usage of `ExportedXYZ` and just using the raw `FFI_XYZ`. We should make the usage consistent across all struct types. - [ ] Add documentation to explain the reasoning behind creating the data the way we do in the private data and foreign structs. - [ ] Add documentation to explain more clearly the delineation between the `ExportedXYZ` and `ForeignXYZ`. It would probably be good to have a use case since which is "foreign" and which is "exported" can be complicated during some of the function calls. - [ ] It would be *great* to demonstrate a C++ implementation linked against DataFusion rust. This might really open the doors for some implementations that are not feasible to convert to Rust. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
