westonpace commented on issue #15103:
URL: https://github.com/apache/arrow/issues/15103#issuecomment-1406042300

   > I am also curious how the development on Apache Arrow and [Apache Arrow 
Datafusion](https://arrow.apache.org/datafusion/) (or [python 
bindings](https://github.com/apache/arrow-datafusion-python) work together.
   
   Pyarrow, datasets, and the streaming execution engine (Acero) are all based 
on arrow-C++.  Datafusion is based on arrow-rs (rust).  So they are two 
separate stacks.
   
   That being said, they should be very interoperable.  Both can import and 
export to the C data interface for intra-process communication.  Both can read 
and write the arrow IPC format for IPC over something like flight for 
inter-process communication.
   
   > Do the projects work closely to re-use/align overlapping functionality?
   
   Unfortunately, a universal UDF harness has not yet been created.  In 
general, datafusion, based on rust, is not likely to trust a C/C++ UDFs for 
safety reasons (at least, that is my understanding, I could be wrong).  Going 
the other way (using datafusion UDFs in pyarrow) should be possible but no one 
has done it yet.  Some very interesting ideas have been proposed for other 
alternatives (there was a discussion a while back on the mailing list).  
Perhaps the most interesting to me is the idea of using WASM for UDFs.  LLVM 
has also been proposed though it doesn't fix the safety concerns.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to