alamb commented on a change in pull request #8097:
URL: https://github.com/apache/arrow/pull/8097#discussion_r482891141



##########
File path: rust/datafusion/src/logical_plan/mod.rs
##########
@@ -755,8 +756,72 @@ impl fmt::Debug for Expr {
     }
 }
 
-/// The LogicalPlan represents different types of relations (such as 
Projection,
-/// Filter, etc) and can be created by the SQL query planner and the DataFrame 
API.
+/// This defines the interface for `LogicalPlan` nodes that can be
+/// used to extend DataFusion with custom relational operators.
+///
+/// See the example in
+/// [user_defined_plan.rs](../../tests/user_defined_plan.rs) for an
+/// example of how to use this extenison API
+pub trait ExtensionPlanNode: Debug {

Review comment:
       I struggled with this naming -- I could go with `UserDefinedNode` though 
I wonder if it should be `UserDefinedPlanNode` or something to distinguish it 
from nodes in `ExecutionPlan`s.  @andygrove  any thoughts?

##########
File path: rust/datafusion/src/execution/context.rs
##########
@@ -417,7 +445,16 @@ impl ExecutionConfig {
         mut self,
         physical_planner: Arc<dyn PhysicalPlanner>,
     ) -> Self {
-        self.physical_planner = Some(physical_planner);
+        self.physical_planner = physical_planner;
+        self
+    }
+
+    /// Optional source of additional optimization passes
+    pub fn with_optimizer_rule_source(

Review comment:
       I would like to implement the pattern of passing in  `Arc<dyn 
OptimizerRule>` directly (in fact I tried to implement it initially). The issue 
I found is that `ExecutionContext` is read only (immutable) during planning, 
but the signature of `OptimizerRule` requires `&mut self` (aka it can change 
the `OptimizerRule` state).
   
   I think Rust is actually saving us from ourselves, because as discussed in  
https://issues.apache.org/jira/browse/ARROW-9888 the ExecutionContext can be 
shared between threads (and thus allowing one thread to mutate the 
`OptimizerRule`s could cause problems) 
   
   What do you think about making `optimizer_rules` `Vec<dyn OptimizerRule + 
Clone>` and then planning with the result of `clone()`ing ? 
   
   Actually, the more I think about this the more I like the `Clone` idea...

##########
File path: rust/datafusion/src/physical_plan/planner.rs
##########
@@ -331,6 +355,16 @@ impl DefaultPhysicalPlanner {
                 let schema_ref = Arc::new(schema.as_ref().clone());
                 Ok(Arc::new(ExplainExec::new(schema_ref, stringified_plans)))
             }
+            LogicalPlan::Extension { node } => {
+                let inputs = node
+                    .inputs()
+                    .into_iter()
+                    .map(|input_plan| self.create_physical_plan(input_plan, 
ctx_state))
+                    .collect::<Result<Vec<_>>>()?;
+
+                self.extension_planner
+                    .plan_extension(node.as_ref(), inputs, ctx_state)

Review comment:
       good idea -- I will do so.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to