peter-toth commented on PR #8891:
URL: 
https://github.com/apache/arrow-datafusion/pull/8891#issuecomment-1933706262

   Let me show you a concrete example where `transform()` can be useful in the 
future. Here is a `transform_up()` in the current code:
   
https://github.com/apache/arrow-datafusion/blob/10e6d1870280296b31c6faff29ddf3f8fe98bbc4/datafusion/core/src/physical_optimizer/coalesce_batches.rs#L55-L81
   
   After the feature in my sidenote (bitmaps in each treenodes) has been addeed 
this transformation can be rewritten in a more  effective form with 
`transform()`.
   ```rust
    plan.transform(
       &|plan| {
           if contains_pattern(&plan, TreeNodePattern::FilterExec, 
TreeNodePattern::HashJoinExec, TreeNodePattern::RepartitionExec) {
               Ok(Transformed::no(plan))
           } else {
               Ok(Transformed::new(plan, false, TreeNodeRecursion::Skip)) // No 
transformation, just don't recurse into children as there is no interresting 
node in the subtree.
           }
       }, 
       &|plan| { 
        let plan_any = plan.as_any(); 
        // The goal here is to detect operators that could produce small 
batches and only 
        // wrap those ones with a CoalesceBatchesExec operator. An alternate 
approach here 
        // would be to build the coalescing logic directly into the operators 
        // See https://github.com/apache/arrow-datafusion/issues/139 
        let wrap_in_coalesce = plan_any.downcast_ref::<FilterExec>().is_some() 
            || plan_any.downcast_ref::<HashJoinExec>().is_some() 
            // Don't need to add CoalesceBatchesExec after a round robin 
RepartitionExec 
            || plan_any 
                .downcast_ref::<RepartitionExec>() 
                .map(|repart_exec| { 
                    !matches!( 
                        repart_exec.partitioning().clone(), 
                        Partitioning::RoundRobinBatch(_) 
                    ) 
                }) 
                .unwrap_or(false); 
        if wrap_in_coalesce { 
            Ok(Transformed::yes(Arc::new(CoalesceBatchesExec::new( 
                plan, 
                target_batch_size, 
            )))) 
        } else { 
            Ok(Transformed::no(plan)) 
        } 
    }) 
   ```
   See how the original closure became `f_up` and a new `f_down` closure is 
added. Obviously this could be rewritten with `rewrite()` / `TreeNodeRewriter` 
but 2 closure `transform()` is much shorter form of that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to