Hi,

I fielded a PR [1] to open up a discussion to incorporate python-datafusion
[2] into the Apache Arrow project.

Python-datafusion is a Python library [3] built on top of DataFusions that
enables people to use DataFusion from Python. It leverages the C data
interface for zero-cost copy between DataFusion and pyarrow (a bunch of
pointers is shared around).

For example, it allows users to read a CSV from Rust, pass the arrays to a
C++ kernel, continue the computation in Rust's kernels, and export to
parquet using Rust (or C++ parquet, or whatever ^_^). It supports UDFs and
UDAFs, in case someone wants to go crazy with Pyarrow, Pandas, numpy or
tensorflow. =)

Best,
Jorge

[1] https://github.com/apache/arrow-datafusion/pull/69
[2] https://github.com/jorgecarleitao/datafusion-python
[3] https://pypi.org/project/datafusion/

Reply via email to