alamb commented on code in PR #19792:
URL: https://github.com/apache/datafusion/pull/19792#discussion_r2691638176
##########
datafusion/physical-plan/src/execution_plan.rs:
##########
@@ -1384,6 +1395,52 @@ pub fn check_not_null_constraints(
Ok(batch)
}
+/// Make plan ready to be re-executed returning its clone with state reset for
all nodes.
+///
+/// Some plans will change their internal states after execution, making them
unable to be executed again.
+/// This function uses [`ExecutionPlan::reset_state`] to reset any internal
state within the plan.
+///
+/// An example is `CrossJoinExec`, which loads the left table into memory and
stores it in the plan.
+/// However, if the data of the left table is derived from the work table, it
will become outdated
+/// as the work table changes. When the next iteration executes this plan
again, we must clear the left table.
+///
+/// # Limitations
+///
+/// While this function enables plan reuse, it does not allow the same plan to
be executed if it (OR):
+///
+/// * uses dynamic filters,
+/// * represents a recursive query.
+///
+pub fn reset_plan_states(plan: Arc<dyn ExecutionPlan>) -> Result<Arc<dyn
ExecutionPlan>> {
Review Comment:
👍
##########
datafusion/physical-plan/src/aggregates/mod.rs:
##########
@@ -1236,14 +1238,24 @@ impl ExecutionPlan for AggregateExec {
fn with_new_children(
self: Arc<Self>,
- children: Vec<Arc<dyn ExecutionPlan>>,
+ mut children: Vec<Arc<dyn ExecutionPlan>>,
) -> Result<Arc<dyn ExecutionPlan>> {
+ if has_same_children_properties(&self, &children)? {
+ // Avoid properties re-computation.
Review Comment:
I think we coulda avoid even more copying / cloning in the common case using
`Arc::make_mut`
https://doc.rust-lang.org/std/sync/struct.Arc.html#method.make_mut
Something like
```rust
if has_same_children_properties(&self, &children)? {
// Avoid properties re-computation.
let me = Arc::make_mut(&mut self);
me.input = children.swap_remove(0);
me.metrics = ExecutionPlanMetricsSet::new();
return Ok(self);
}
```
##########
datafusion/physical-plan/src/aggregates/mod.rs:
##########
@@ -1236,14 +1238,24 @@ impl ExecutionPlan for AggregateExec {
fn with_new_children(
self: Arc<Self>,
- children: Vec<Arc<dyn ExecutionPlan>>,
+ mut children: Vec<Arc<dyn ExecutionPlan>>,
) -> Result<Arc<dyn ExecutionPlan>> {
+ if has_same_children_properties(&self, &children)? {
+ // Avoid properties re-computation.
Review Comment:
Now that I write that, I wonder if we could use make_mut to avoid lots of
cloning in general 🤔
We could check if the new children are the same as the existing children and
if so return self 🤔
##########
datafusion-examples/examples/custom_data_source/custom_datasource.rs:
##########
@@ -192,7 +192,7 @@ impl TableProvider for CustomDataSource {
struct CustomExec {
db: CustomDataSource,
projected_schema: SchemaRef,
- cache: PlanProperties,
+ cache: Arc<PlanProperties>,
Review Comment:
💯 for this change
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]