alamb commented on code in PR #19792:
URL: https://github.com/apache/datafusion/pull/19792#discussion_r2691638176


##########
datafusion/physical-plan/src/execution_plan.rs:
##########
@@ -1384,6 +1395,52 @@ pub fn check_not_null_constraints(
     Ok(batch)
 }
 
+/// Make plan ready to be re-executed returning its clone with state reset for 
all nodes.
+///
+/// Some plans will change their internal states after execution, making them 
unable to be executed again.
+/// This function uses [`ExecutionPlan::reset_state`] to reset any internal 
state within the plan.
+///
+/// An example is `CrossJoinExec`, which loads the left table into memory and 
stores it in the plan.
+/// However, if the data of the left table is derived from the work table, it 
will become outdated
+/// as the work table changes. When the next iteration executes this plan 
again, we must clear the left table.
+///
+/// # Limitations
+///
+/// While this function enables plan reuse, it does not allow the same plan to 
be executed if it (OR):
+///
+/// * uses dynamic filters,
+/// * represents a recursive query.
+///
+pub fn reset_plan_states(plan: Arc<dyn ExecutionPlan>) -> Result<Arc<dyn 
ExecutionPlan>> {

Review Comment:
   👍 



##########
datafusion/physical-plan/src/aggregates/mod.rs:
##########
@@ -1236,14 +1238,24 @@ impl ExecutionPlan for AggregateExec {
 
     fn with_new_children(
         self: Arc<Self>,
-        children: Vec<Arc<dyn ExecutionPlan>>,
+        mut children: Vec<Arc<dyn ExecutionPlan>>,
     ) -> Result<Arc<dyn ExecutionPlan>> {
+        if has_same_children_properties(&self, &children)? {
+            // Avoid properties re-computation.

Review Comment:
   I think we coulda avoid even more copying / cloning in the common case using 
`Arc::make_mut` 
https://doc.rust-lang.org/std/sync/struct.Arc.html#method.make_mut
   
   Something like
   ```rust
           if has_same_children_properties(&self, &children)? {
               // Avoid properties re-computation.
               let me = Arc::make_mut(&mut self);
               me.input = children.swap_remove(0);
               me.metrics = ExecutionPlanMetricsSet::new();
               return Ok(self);
           }
   ```



##########
datafusion/physical-plan/src/aggregates/mod.rs:
##########
@@ -1236,14 +1238,24 @@ impl ExecutionPlan for AggregateExec {
 
     fn with_new_children(
         self: Arc<Self>,
-        children: Vec<Arc<dyn ExecutionPlan>>,
+        mut children: Vec<Arc<dyn ExecutionPlan>>,
     ) -> Result<Arc<dyn ExecutionPlan>> {
+        if has_same_children_properties(&self, &children)? {
+            // Avoid properties re-computation.

Review Comment:
   Now that I write that, I wonder if we could use make_mut to avoid lots of 
cloning in general 🤔 
   
   We could check if the new children are the same as the existing children and 
if so return self 🤔 



##########
datafusion-examples/examples/custom_data_source/custom_datasource.rs:
##########
@@ -192,7 +192,7 @@ impl TableProvider for CustomDataSource {
 struct CustomExec {
     db: CustomDataSource,
     projected_schema: SchemaRef,
-    cache: PlanProperties,
+    cache: Arc<PlanProperties>,

Review Comment:
   💯  for this change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to