Andrew Lamb created ARROW-9940:
----------------------------------
Summary: [Rust][DataFusion] Generic "extension package" mechanism
Key: ARROW-9940
URL: https://issues.apache.org/jira/browse/ARROW-9940
Project: Apache Arrow
Issue Type: New Feature
Reporter: Andrew Lamb
This came from [~jorgecarleitao]'s suggestion on this PR:
https://github.com/apache/arrow/pull/8097/files#r482968858
The high level idea is to design and implement an upgrade/ improvement to the
DataFusion APIs which allows registering composeable sets of
UserDefinedLogicalNode, Logical planning rules and Physical Planning rules for
some functionality.
h2. The use case:
You publish the TopK extension as a (library) crate called datafusion-topk, and
I publish a crate datafusion-s3 with another extension.
A user wants to use both extensions. It installs them by:
# adding each crate to Cargo.toml
# initialize the default planner with both of them
# plan them
# execute them
I.e. freaking easy!
Broadly speaking, this allows the existence of an ecosystem of
extensions/user-defined plans: people can share hand-crafted plans and plans
can be added as dependencies to the crate and registered to the planner to be
used by other people. 🤯
This also reduces the pressure of placing everything in DataFusion's codebase:
if we offer an API to extend DataFusion in this way, people can just distribute
libraries with the extension/user-defined plan without having to go through the
decision process of whether X is part of DataFusion's core or not (e.g. a scan
of format Y, or a scan over protocol Z).
For me, this use case does require an easy way to achieve 2. initialize the
default planner with both of them. But again, this PR is definitely a major
step in this direction!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)