mingmwang commented on issue #2175:
URL:
https://github.com/apache/arrow-datafusion/issues/2175#issuecomment-1262648962
> > I wonder if we could extend PhysicalExpr to require PartialEq
>
> PartialEq is not object-safe, it takes Self, so this isn't possible (I
don't think)
>
> > But if the expressions are Trait, because the type is unknown, how to
write code to compare the expressions?
>
> `Expr` is a good example of where Rust's support for structured matching
is fantastic, and I would definitely not propose switching `BinaryExpr` to be a
trait object 👍
>
> Is `ExecutionPlan` a similar use-case though? Most places it gets used
currently don't care what variant is used, and I don't see any that are
performing nested matching? This could be because it would currently be
obnoxious, but is it possible that the fact they are represented differently is
actually a strength, and represents the differing requirements of the different
representations?
>
> > The expressions are also tree like structures and we have to downcast
and destructuring and to do the comparing recursively.
>
> This is actually one of the weaknesses of the enum approach, type erasure
has to be implemented via a visitor which makes walking the tree significantly
more complicated. If you want to match a specific expression, it works well, if
you want to do anything more generic it becomes incredibly painful.
>
> You are therefore left with a tradeoff based on the common use-case, for
BinaryExpr the enumeration is clearly superior, LogicalPlan I'd argue its a bit
of a toss up, I have found the lack of type-erasure rather obnoxious at times,
for ExecutionPlan, eh...
I'm not sure why visiting a logical plan is a toss up. Here is a sample of
how to make visit ExecutionPlan more generic and we
can get ride of the visit pattern and tree traversing from the rule
implementations
````
plan_utils.rs
pub fn map_children<F>(
plan: Arc<dyn ExecutionPlan>,
transform: F,
) -> Result<Arc<dyn ExecutionPlan>>
where
F: Fn(Arc<dyn ExecutionPlan>) -> Result<Arc<dyn ExecutionPlan>>,
{
if !plan.children().is_empty() {
let new_children: Result<Vec<_>> =
plan.children().into_iter().map(transform).collect();
with_new_children_if_necessary(plan, new_children?)
} else {
Ok(plan)
}
}
pub fn transform<F>(
plan: Arc<dyn ExecutionPlan>,
op: &F,
) -> Result<Arc<dyn ExecutionPlan>>
where
F: Fn(Arc<dyn ExecutionPlan>) -> Option<Arc<dyn ExecutionPlan>>,
{
transform_down(plan, op)
}
pub fn transform_down<F>(
plan: Arc<dyn ExecutionPlan>,
op: &F,
) -> Result<Arc<dyn ExecutionPlan>>
where
F: Fn(Arc<dyn ExecutionPlan>) -> Option<Arc<dyn ExecutionPlan>>,
{
let plan_cloned = plan.clone();
let after_op = match op(plan_cloned) {
Some(value) => value,
None => plan,
};
map_children(after_op.clone(), |plan: Arc<dyn ExecutionPlan>| {
transform_down(plan, op)
})
}
pub fn transform_up<F>(
plan: Arc<dyn ExecutionPlan>,
op: &F,
) -> Result<Arc<dyn ExecutionPlan>>
where
F: Fn(Arc<dyn ExecutionPlan>) -> Option<Arc<dyn ExecutionPlan>>,
{
let plan_cloned = plan.clone();
let after_op_children = map_children(plan_cloned, |plan: Arc<dyn
ExecutionPlan>| {
transform_down(plan, op)
});
let new_plan = match op(after_op_children?) {
Some(value) => value,
None => plan,
};
Ok(new_plan)
}
--------------------------------------------------
impl PhysicalOptimizerRule for CoalesceBatches2 {
fn optimize(
&self,
plan: Arc<dyn crate::physical_plan::ExecutionPlan>,
_config: &crate::execution::context::SessionConfig,
) -> Result<Arc<dyn crate::physical_plan::ExecutionPlan>> {
transform(plan, &apply_patterns(self.target_batch_size))
}
fn name(&self) -> &str {
"coalesce_batches2"
}
}
fn apply_patterns(
target_batch_size: usize,
) -> impl Fn(
Arc<dyn crate::physical_plan::ExecutionPlan>,
) -> Option<Arc<dyn crate::physical_plan::ExecutionPlan>> {
move |plan| {
let plan_any = plan.as_any();
let wrap_in_coalesce =
plan_any.downcast_ref::<FilterExec>().is_some()
|| plan_any.downcast_ref::<HashJoinExec>().is_some()
|| plan_any.downcast_ref::<RepartitionExec>().is_some();
if wrap_in_coalesce {
Some(Arc::new(CoalesceBatchesExec::new(
plan.clone(),
target_batch_size,
)))
} else {
None
}
}
}
````
The same utils can be implemented for Enums/LogicalPlans I think.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]