peter-toth commented on PR #8891: URL: https://github.com/apache/arrow-datafusion/pull/8891#issuecomment-1933706262
Let me show you a concrete example where `transform()` can be useful in the future. Here is a `transform_up()` in the current code: https://github.com/apache/arrow-datafusion/blob/10e6d1870280296b31c6faff29ddf3f8fe98bbc4/datafusion/core/src/physical_optimizer/coalesce_batches.rs#L55-L81 After the feature in my sidenote (bitmaps in each treenodes) has been addeed this transformation can be rewritten in a more effective form with `transform()`. ```rust plan.transform( &|plan| { if contains_pattern(&plan, TreeNodePattern::FilterExec, TreeNodePattern::HashJoinExec, TreeNodePattern::RepartitionExec) { Ok(Transformed::no(plan)) } else { Ok(Transformed::new(plan, false, TreeNodeRecursion::Skip)) // No transformation, just don't recurse into children as there is no interresting node in the subtree. } }, &|plan| { let plan_any = plan.as_any(); // The goal here is to detect operators that could produce small batches and only // wrap those ones with a CoalesceBatchesExec operator. An alternate approach here // would be to build the coalescing logic directly into the operators // See https://github.com/apache/arrow-datafusion/issues/139 let wrap_in_coalesce = plan_any.downcast_ref::<FilterExec>().is_some() || plan_any.downcast_ref::<HashJoinExec>().is_some() // Don't need to add CoalesceBatchesExec after a round robin RepartitionExec || plan_any .downcast_ref::<RepartitionExec>() .map(|repart_exec| { !matches!( repart_exec.partitioning().clone(), Partitioning::RoundRobinBatch(_) ) }) .unwrap_or(false); if wrap_in_coalesce { Ok(Transformed::yes(Arc::new(CoalesceBatchesExec::new( plan, target_batch_size, )))) } else { Ok(Transformed::no(plan)) } }) ``` See how the original closure became `f_up` and a new `f_down` closure is added. Obviously this could be rewritten with `rewrite()` / `TreeNodeRewriter` but 2 closure `transform()` is much shorter form of that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
