[GitHub] [arrow-datafusion] mingmwang commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

GitBox Thu, 29 Sep 2022 11:20:57 -0700


mingmwang commented on issue #2175:
URL: 
https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1262648962


   > > I wonder if we could extend PhysicalExpr to require PartialEq
   > 
   > PartialEq is not object-safe, it takes Self, so this isn't possible (I 
don't think)
   > 
   > > But if the expressions are Trait, because the type is unknown, how to 
write code to compare the expressions?
   > 
   > `Expr` is a good example of where Rust's support for structured matching 
is fantastic, and I would definitely not propose switching `BinaryExpr` to be a 
trait object 👍
   > 
   > Is `ExecutionPlan` a similar use-case though? Most places it gets used 
currently don't care what variant is used, and I don't see any that are 
performing nested matching? This could be because it would currently be 
obnoxious, but is it possible that the fact they are represented differently is 
actually a strength, and represents the differing requirements of the different 
representations?
   > 
   > > The expressions are also tree like structures and we have to downcast 
and destructuring and to do the comparing recursively.
   > 
   > This is actually one of the weaknesses of the enum approach, type erasure 
has to be implemented via a visitor which makes walking the tree significantly 
more complicated. If you want to match a specific expression, it works well, if 
you want to do anything more generic it becomes incredibly painful.
   > 
   > You are therefore left with a tradeoff based on the common use-case, for 
BinaryExpr the enumeration is clearly superior, LogicalPlan I'd argue its a bit 
of a toss up, I have found the lack of type-erasure rather obnoxious at times, 
for ExecutionPlan, eh...
   
   I'm not sure why visiting a logical plan is a toss up. Here is a sample of 
how to make visit ExecutionPlan more generic and we
   can get ride of the visit pattern and tree traversing from the rule 
implementations
   
   ```` 
   plan_utils.rs
   
   pub fn map_children<F>(
       plan: Arc<dyn ExecutionPlan>,
       transform: F,
   ) -> Result<Arc<dyn ExecutionPlan>>
   where
       F: Fn(Arc<dyn ExecutionPlan>) -> Result<Arc<dyn ExecutionPlan>>,
   {
       if !plan.children().is_empty() {
           let new_children: Result<Vec<_>> =
               plan.children().into_iter().map(transform).collect();
           with_new_children_if_necessary(plan, new_children?)
       } else {
           Ok(plan)
       }
   }
   
   pub fn transform<F>(
       plan: Arc<dyn ExecutionPlan>,
       op: &F,
   ) -> Result<Arc<dyn ExecutionPlan>>
   where
       F: Fn(Arc<dyn ExecutionPlan>) -> Option<Arc<dyn ExecutionPlan>>,
   {
       transform_down(plan, op)
   }
   
   pub fn transform_down<F>(
       plan: Arc<dyn ExecutionPlan>,
       op: &F,
   ) -> Result<Arc<dyn ExecutionPlan>>
   where
       F: Fn(Arc<dyn ExecutionPlan>) -> Option<Arc<dyn ExecutionPlan>>,
   {
       let plan_cloned = plan.clone();
       let after_op = match op(plan_cloned) {
           Some(value) => value,
           None => plan,
       };
       map_children(after_op.clone(), |plan: Arc<dyn ExecutionPlan>| {
           transform_down(plan, op)
       })
   }
   
   pub fn transform_up<F>(
       plan: Arc<dyn ExecutionPlan>,
       op: &F,
   ) -> Result<Arc<dyn ExecutionPlan>>
   where
       F: Fn(Arc<dyn ExecutionPlan>) -> Option<Arc<dyn ExecutionPlan>>,
   {
       let plan_cloned = plan.clone();
       let after_op_children = map_children(plan_cloned, |plan: Arc<dyn 
ExecutionPlan>| {
           transform_down(plan, op)
       });
       let new_plan = match op(after_op_children?) {
           Some(value) => value,
           None => plan,
       };
       Ok(new_plan)
   }
   
   --------------------------------------------------
   impl PhysicalOptimizerRule for CoalesceBatches2 {
       fn optimize(
           &self,
           plan: Arc<dyn crate::physical_plan::ExecutionPlan>,
           _config: &crate::execution::context::SessionConfig,
       ) -> Result<Arc<dyn crate::physical_plan::ExecutionPlan>> {
           transform(plan, &apply_patterns(self.target_batch_size))
       }
   
       fn name(&self) -> &str {
           "coalesce_batches2"
       }
   }
   
   fn apply_patterns(
       target_batch_size: usize,
   ) -> impl Fn(
       Arc<dyn crate::physical_plan::ExecutionPlan>,
   ) -> Option<Arc<dyn crate::physical_plan::ExecutionPlan>> {
       move |plan| {
           let plan_any = plan.as_any();
           let wrap_in_coalesce = 
plan_any.downcast_ref::<FilterExec>().is_some()
               || plan_any.downcast_ref::<HashJoinExec>().is_some()
               || plan_any.downcast_ref::<RepartitionExec>().is_some();
           if wrap_in_coalesce {
               Some(Arc::new(CoalesceBatchesExec::new(
                   plan.clone(),
                   target_batch_size,
               )))
           } else {
               None
           }
       }
   }
   
   ```` 
   
   The same utils can be implemented for Enums/LogicalPlans I think.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] mingmwang commented on issue #2175: [Discuss] Different implementation style between Expr, LogicalPlan and ExecutionPlan

Reply via email to