Hi, I fielded a PR [1] to open up a discussion to incorporate python-datafusion [2] into the Apache Arrow project.
Python-datafusion is a Python library [3] built on top of DataFusions that enables people to use DataFusion from Python. It leverages the C data interface for zero-cost copy between DataFusion and pyarrow (a bunch of pointers is shared around). For example, it allows users to read a CSV from Rust, pass the arrays to a C++ kernel, continue the computation in Rust's kernels, and export to parquet using Rust (or C++ parquet, or whatever ^_^). It supports UDFs and UDAFs, in case someone wants to go crazy with Pyarrow, Pandas, numpy or tensorflow. =) Best, Jorge [1] https://github.com/apache/arrow-datafusion/pull/69 [2] https://github.com/jorgecarleitao/datafusion-python [3] https://pypi.org/project/datafusion/