jorgecarleitao commented on a change in pull request #8097:
URL: https://github.com/apache/arrow/pull/8097#discussion_r482703011



##########
File path: rust/datafusion/src/physical_plan/planner.rs
##########
@@ -331,6 +355,16 @@ impl DefaultPhysicalPlanner {
                 let schema_ref = Arc::new(schema.as_ref().clone());
                 Ok(Arc::new(ExplainExec::new(schema_ref, stringified_plans)))
             }
+            LogicalPlan::Extension { node } => {
+                let inputs = node
+                    .inputs()
+                    .into_iter()
+                    .map(|input_plan| self.create_physical_plan(input_plan, 
ctx_state))
+                    .collect::<Result<Vec<_>>>()?;
+
+                self.extension_planner

Review comment:
       If I understand correctly, this implies that a single `ExtensionPlanner` 
needs to be able to handle all custom nodes.
   
   Thus, if two extensions are shared, e.g. in cargo crates, consumers of those 
extensions can't just add them, as they need to re-write their extension 
planner to incorporate each of them.
   
   An alternative:
   
   * make `extension_planner` 's interface returns `Result<Option<Arc<dyn 
ExecutionPlan>>>`, and `None` represent that the planner does not know how to 
plan that node
   * change the logic here to loop over extensions until a non-None is found
   * make extensions return `None` for any node/nodes that they do not support
   * make ` self.extension_planner` be `self.extension_planners`
   * make `with_extension_planner` be `with_extension_planners(Vec)` (or 
`add_extension_planner()`)
   
   This would allow extensions to be shared along with their respective planner 
(or planners if they support multiple execution architectures), and users can 
just add it to their list of extensions and pass them to 
`with_extension_planners`.

##########
File path: rust/datafusion/tests/user_defined_plan.rs
##########
@@ -0,0 +1,500 @@
+// Licensed to the Apache Software Foundation (ASF) under one

Review comment:
       Savage! 💯💯

##########
File path: rust/datafusion/src/physical_plan/planner.rs
##########
@@ -331,6 +355,16 @@ impl DefaultPhysicalPlanner {
                 let schema_ref = Arc::new(schema.as_ref().clone());
                 Ok(Arc::new(ExplainExec::new(schema_ref, stringified_plans)))
             }
+            LogicalPlan::Extension { node } => {
+                let inputs = node
+                    .inputs()
+                    .into_iter()
+                    .map(|input_plan| self.create_physical_plan(input_plan, 
ctx_state))
+                    .collect::<Result<Vec<_>>>()?;
+
+                self.extension_planner
+                    .plan_extension(node.as_ref(), inputs, ctx_state)

Review comment:
       I think that we should have a check around here to verify that the 
logical and physical node has the same output `schema`.

##########
File path: rust/datafusion/src/logical_plan/mod.rs
##########
@@ -755,8 +756,72 @@ impl fmt::Debug for Expr {
     }
 }
 
-/// The LogicalPlan represents different types of relations (such as 
Projection,
-/// Filter, etc) and can be created by the SQL query planner and the DataFrame 
API.
+/// This defines the interface for `LogicalPlan` nodes that can be
+/// used to extend DataFusion with custom relational operators.
+///
+/// See the example in
+/// [user_defined_plan.rs](../../tests/user_defined_plan.rs) for an
+/// example of how to use this extenison API
+pub trait ExtensionPlanNode: Debug {

Review comment:
       Another idea: `UserDefinedNode` (UDN), to be aligned with user defined 
function (UDF) that most people are familiar with and already know what it 
means.

##########
File path: rust/datafusion/src/execution/context.rs
##########
@@ -417,7 +445,16 @@ impl ExecutionConfig {
         mut self,
         physical_planner: Arc<dyn PhysicalPlanner>,
     ) -> Self {
-        self.physical_planner = Some(physical_planner);
+        self.physical_planner = physical_planner;
+        self
+    }
+
+    /// Optional source of additional optimization passes
+    pub fn with_optimizer_rule_source(

Review comment:
       Going in the same lines as my comment about the planning, one idea is to 
make `optimizer_rule_source` be a vector of rules, and instantiate 
`ExecutionConfig` with a `vec` of those rules, instead of having an extra 
`OptimizerRuleSource` to hold those rules.
   
   The rules themselves are `Arc<dyn OptimizerRule>`, and users write them by 
the trait.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to