alamb commented on a change in pull request #8097:
URL: https://github.com/apache/arrow/pull/8097#discussion_r482891141
##########
File path: rust/datafusion/src/logical_plan/mod.rs
##########
@@ -755,8 +756,72 @@ impl fmt::Debug for Expr {
}
}
-/// The LogicalPlan represents different types of relations (such as
Projection,
-/// Filter, etc) and can be created by the SQL query planner and the DataFrame
API.
+/// This defines the interface for `LogicalPlan` nodes that can be
+/// used to extend DataFusion with custom relational operators.
+///
+/// See the example in
+/// [user_defined_plan.rs](../../tests/user_defined_plan.rs) for an
+/// example of how to use this extenison API
+pub trait ExtensionPlanNode: Debug {
Review comment:
I struggled with this naming -- I could go with `UserDefinedNode` though
I wonder if it should be `UserDefinedPlanNode` or something to distinguish it
from nodes in `ExecutionPlan`s. @andygrove any thoughts?
##########
File path: rust/datafusion/src/execution/context.rs
##########
@@ -417,7 +445,16 @@ impl ExecutionConfig {
mut self,
physical_planner: Arc<dyn PhysicalPlanner>,
) -> Self {
- self.physical_planner = Some(physical_planner);
+ self.physical_planner = physical_planner;
+ self
+ }
+
+ /// Optional source of additional optimization passes
+ pub fn with_optimizer_rule_source(
Review comment:
I would like to implement the pattern of passing in `Arc<dyn
OptimizerRule>` directly (in fact I tried to implement it initially). The issue
I found is that `ExecutionContext` is read only (immutable) during planning,
but the signature of `OptimizerRule` requires `&mut self` (aka it can change
the `OptimizerRule` state).
I think Rust is actually saving us from ourselves, because as discussed in
https://issues.apache.org/jira/browse/ARROW-9888 the ExecutionContext can be
shared between threads (and thus allowing one thread to mutate the
`OptimizerRule`s could cause problems)
What do you think about making `optimizer_rules` `Vec<dyn OptimizerRule +
Clone>` and then planning with the result of `clone()`ing ?
Actually, the more I think about this the more I like the `Clone` idea...
##########
File path: rust/datafusion/src/physical_plan/planner.rs
##########
@@ -331,6 +355,16 @@ impl DefaultPhysicalPlanner {
let schema_ref = Arc::new(schema.as_ref().clone());
Ok(Arc::new(ExplainExec::new(schema_ref, stringified_plans)))
}
+ LogicalPlan::Extension { node } => {
+ let inputs = node
+ .inputs()
+ .into_iter()
+ .map(|input_plan| self.create_physical_plan(input_plan,
ctx_state))
+ .collect::<Result<Vec<_>>>()?;
+
+ self.extension_planner
+ .plan_extension(node.as_ref(), inputs, ctx_state)
Review comment:
good idea -- I will do so.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]