jorgecarleitao commented on a change in pull request #8097:
URL: https://github.com/apache/arrow/pull/8097#discussion_r482703011
##########
File path: rust/datafusion/src/physical_plan/planner.rs
##########
@@ -331,6 +355,16 @@ impl DefaultPhysicalPlanner {
let schema_ref = Arc::new(schema.as_ref().clone());
Ok(Arc::new(ExplainExec::new(schema_ref, stringified_plans)))
}
+ LogicalPlan::Extension { node } => {
+ let inputs = node
+ .inputs()
+ .into_iter()
+ .map(|input_plan| self.create_physical_plan(input_plan,
ctx_state))
+ .collect::<Result<Vec<_>>>()?;
+
+ self.extension_planner
Review comment:
If I understand correctly, this implies that a single `ExtensionPlanner`
needs to be able to handle all custom nodes.
Thus, if two extensions are shared, e.g. in cargo crates, consumers of those
extensions can't just add them, as they need to re-write their extension
planner to incorporate each of them.
An alternative:
* make `extension_planner` 's interface returns `Result<Option<Arc<dyn
ExecutionPlan>>>`, and `None` represent that the planner does not know how to
plan that node
* change the logic here to loop over extensions until a non-None is found
* make extensions return `None` for any node/nodes that they do not support
* make ` self.extension_planner` be `self.extension_planners`
* make `with_extension_planner` be `with_extension_planners(Vec)` (or
`add_extension_planner()`)
This would allow extensions to be shared along with their respective planner
(or planners if they support multiple execution architectures), and users can
just add it to their list of extensions and pass them to
`with_extension_planners`.
##########
File path: rust/datafusion/tests/user_defined_plan.rs
##########
@@ -0,0 +1,500 @@
+// Licensed to the Apache Software Foundation (ASF) under one
Review comment:
Savage! 💯💯
##########
File path: rust/datafusion/src/physical_plan/planner.rs
##########
@@ -331,6 +355,16 @@ impl DefaultPhysicalPlanner {
let schema_ref = Arc::new(schema.as_ref().clone());
Ok(Arc::new(ExplainExec::new(schema_ref, stringified_plans)))
}
+ LogicalPlan::Extension { node } => {
+ let inputs = node
+ .inputs()
+ .into_iter()
+ .map(|input_plan| self.create_physical_plan(input_plan,
ctx_state))
+ .collect::<Result<Vec<_>>>()?;
+
+ self.extension_planner
+ .plan_extension(node.as_ref(), inputs, ctx_state)
Review comment:
I think that we should have a check around here to verify that the
logical and physical node has the same output `schema`.
##########
File path: rust/datafusion/src/logical_plan/mod.rs
##########
@@ -755,8 +756,72 @@ impl fmt::Debug for Expr {
}
}
-/// The LogicalPlan represents different types of relations (such as
Projection,
-/// Filter, etc) and can be created by the SQL query planner and the DataFrame
API.
+/// This defines the interface for `LogicalPlan` nodes that can be
+/// used to extend DataFusion with custom relational operators.
+///
+/// See the example in
+/// [user_defined_plan.rs](../../tests/user_defined_plan.rs) for an
+/// example of how to use this extenison API
+pub trait ExtensionPlanNode: Debug {
Review comment:
Another idea: `UserDefinedNode` (UDN), to be aligned with user defined
function (UDF) that most people are familiar with and already know what it
means.
##########
File path: rust/datafusion/src/execution/context.rs
##########
@@ -417,7 +445,16 @@ impl ExecutionConfig {
mut self,
physical_planner: Arc<dyn PhysicalPlanner>,
) -> Self {
- self.physical_planner = Some(physical_planner);
+ self.physical_planner = physical_planner;
+ self
+ }
+
+ /// Optional source of additional optimization passes
+ pub fn with_optimizer_rule_source(
Review comment:
Going in the same lines as my comment about the planning, one idea is to
make `optimizer_rule_source` be a vector of rules, and instantiate
`ExecutionConfig` with a `vec` of those rules, instead of having an extra
`OptimizerRuleSource` to hold those rules.
The rules themselves are `Arc<dyn OptimizerRule>`, and users write them by
the trait.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]