westonpace commented on issue #15103: URL: https://github.com/apache/arrow/issues/15103#issuecomment-1406042300
> I am also curious how the development on Apache Arrow and [Apache Arrow Datafusion](https://arrow.apache.org/datafusion/) (or [python bindings](https://github.com/apache/arrow-datafusion-python) work together. Pyarrow, datasets, and the streaming execution engine (Acero) are all based on arrow-C++. Datafusion is based on arrow-rs (rust). So they are two separate stacks. That being said, they should be very interoperable. Both can import and export to the C data interface for intra-process communication. Both can read and write the arrow IPC format for IPC over something like flight for inter-process communication. > Do the projects work closely to re-use/align overlapping functionality? Unfortunately, a universal UDF harness has not yet been created. In general, datafusion, based on rust, is not likely to trust a C/C++ UDFs for safety reasons (at least, that is my understanding, I could be wrong). Going the other way (using datafusion UDFs in pyarrow) should be possible but no one has done it yet. Some very interesting ideas have been proposed for other alternatives (there was a discussion a while back on the mailing list). Perhaps the most interesting to me is the idea of using WASM for UDFs. LLVM has also been proposed though it doesn't fix the safety concerns. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
