ClSlaid commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1597542555
########## datafusion/optimizer/src/replace_distinct_aggregate.rs: ########## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input, schema, })) => { + let expr_cnt = on_expr.len(); + // Construct the aggregation expression to be used to fetch the selected expressions. - let aggr_expr = select_expr - .iter() - .map(|e| { - Expr::AggregateFunction(AggregateFunction::new( - AggregateFunctionFunc::FirstValue, - vec![e.clone()], - false, - None, - sort_expr.clone(), - None, - )) - }) - .collect::<Vec<Expr>>(); + let aggr_expr = vec![Expr::AggregateFunction(AggregateFunction::new( + AggregateFunctionFunc::FirstValue, + select_expr, + false, + None, + sort_expr.clone(), + None, + ))]; + + let aggr_expr = normalize_cols(aggr_expr, input.as_ref())?; + let group_expr = normalize_cols(on_expr, input.as_ref())?; // Build the aggregation plan - let plan = LogicalPlanBuilder::from(input.as_ref().clone()) - .aggregate(on_expr.clone(), aggr_expr.to_vec())? - .build()?; + let plan = LogicalPlan::Aggregate(Aggregate::try_new( + input, group_expr, aggr_expr, + )?); + let lpb = LogicalPlanBuilder::from(plan); Review Comment: The context is that, I extract this logic totally from `LogicalPlanBuilder::aggregate`. It might introduce a problem that if `LogicalPlanBuilder::aggregate` changes in the future, the code here stays still and frustrates people to find and rewrite. The reason why I extract the logic, is that the `LogicalPlanBuilder` accepts a `LogicalPlan`. So that when you'd like to build an aggregate plan with `LogicalPlanBuilder`, it will go through: ```rust // the builder accepts LogicalPlan only, so a clone needed let mut builder = LogicalPlanBuilder::from(Arc<LogicalPlan>.as_ref().clone()); // then you get: builder == LogicalPlanBuilder { LogicalPlan }; // making an aggregate plan builder.aggregate(..); // this will call Arc::new() on the LogicalPlan contained. // another clone here LogicalPlanBuilder { AggrPlan { Arc<LogicalPlan>, ... } } ``` Since many of our `LogicalPlan`s could contain an `Arc<LogicalPlan>`, and for such occasion, I wonder if `Into<Arc<LogicalPlan>>` could save clones, while still allows us to use `LogicalPlanBuilder`, instead of extracting logic. ```rust struct LogicalPlanBuilder<LP> { plan: LP, } impl<LP: Into<Arc<LogicalPlan>>> From<LP> for LogicalPlanBuilder<LP> { // ... } impl LogicalPlan<LP: Into<Arc<LogicalPlan>>> { fn aggregate(self, group_exprs, aggr_exprs) -> Self { // ... AggrPlan::try_new( // ... self.plan.into(), // input: Arc<LogicalPlan> // ... ); // ... } } ``` Because `Arc<T>` has `From<T>` implemented, so when we need `Arc::new`, just call `into` on `self.plan`, this should save times of clones. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org