timsaucer opened a new issue, #18671: URL: https://github.com/apache/datafusion/issues/18671
### Is your feature request related to a problem or challenge? We use protobuf to serialize and deserialize frequently in the FFI work. This has been a great advantage in exposing these functions and reduces the amount of code duplication we need to perform. We currently have a problem in that to call the de/serialize functions we need to pass either a `FunctionRegistry` or a `TaskContext` depending on whether you are working with the logical or physical expressions. Right now the implementation creates a default `SessionContext` before making the de/serialize calls. The problem with this is that if a user has registered a custom function and used that function as an input to any of the FFI calls that take expressions, it will fail in the de/serialize calls. ### Describe the solution you'd like There are a few things I think we should do to improve this work and I have a functioning branch tested against `datafusion-python` that performs most of them. I will be putting up a series of PRs to address. - Add a `TaskContextProvider` trait that we can hold a weak reference to. This is used so that at a point *after* registration we can get the current `TaskContext` during de/serialization. - Add a FFI version of Logical and Physical Extension codec. This one I haven't done yet, but will address soon. - Implement `FFI_Session` - Remove `datafusion` core crate from `datafusion-ffi` dependencies. This has a nice side benefit of reducing library size of some of the providers by half or more. - Add a method to identify when a Foreign FFI struct is actually in the local library. When this is true, convert to the underlying data structure instead of keeping the FFI wrapper. ### Describe alternatives you've considered We could pass in a task context directly and pass that around the FFI structs. This has a major problem in that it would only be based on what was registered at the time of creation of that task context. I haven't been able to come up with a better alternative. ### Additional context This draft PR shows all of these features implemented. I want to do some renaming and I am going to break it into smaller pieces. https://github.com/apache/datafusion/pull/18568 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
