Andrew Lamb created ARROW-9940:
----------------------------------

             Summary: [Rust][DataFusion] Generic "extension package" mechanism
                 Key: ARROW-9940
                 URL: https://issues.apache.org/jira/browse/ARROW-9940
             Project: Apache Arrow
          Issue Type: New Feature
            Reporter: Andrew Lamb


This came from [~jorgecarleitao]'s suggestion on this PR: 
 https://github.com/apache/arrow/pull/8097/files#r482968858

The high level idea is to design and implement an upgrade/ improvement to the 
DataFusion APIs which allows registering composeable sets of 
UserDefinedLogicalNode, Logical planning rules and Physical Planning rules for 
some functionality.

h2. The use case:

You publish the TopK extension as a (library) crate called datafusion-topk, and 
I publish a crate datafusion-s3 with another extension.

A user wants to use both extensions. It installs them by:

# adding each crate to Cargo.toml
# initialize the default planner with both of them
# plan them
# execute them
I.e. freaking easy!

Broadly speaking, this allows the existence of an ecosystem of 
extensions/user-defined plans: people can share hand-crafted plans and plans 
can be added as dependencies to the crate and registered to the planner to be 
used by other people. 🤯

This also reduces the pressure of placing everything in DataFusion's codebase: 
if we offer an API to extend DataFusion in this way, people can just distribute 
libraries with the extension/user-defined plan without having to go through the 
decision process of whether X is part of DataFusion's core or not (e.g. a scan 
of format Y, or a scan over protocol Z).

For me, this use case does require an easy way to achieve 2. initialize the 
default planner with both of them. But again, this PR is definitely a major 
step in this direction!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to