gruuya commented on code in PR #7981:
URL: https://github.com/apache/arrow-datafusion/pull/7981#discussion_r1379324782
##########
datafusion/optimizer/src/common_subexpr_eliminate.rs:
##########
@@ -238,8 +238,17 @@ impl CommonSubexprEliminate {
let rewritten = pop_expr(&mut rewritten)?;
if affected_id.is_empty() {
+ // Alias aggregation epxressions if they have changed
+ // TODO: This should have really been identified above and handled
in the `else` branch
+ let aggr_exprs = new_aggr_expr
+ .iter()
+ .zip(aggr_expr.iter())
+ .map(|(new_expr, old_expr)| {
+ new_expr.clone().alias_if_changed(old_expr.display_name()?)
+ })
+ .collect::<Result<Vec<Expr>>>()?;
// Since group_epxr changes, schema changes also. Use try_new
method.
- Aggregate::try_new(Arc::new(new_input), new_group_expr,
new_aggr_expr)
+ Aggregate::try_new(Arc::new(new_input), new_group_expr, aggr_exprs)
.map(LogicalPlan::Aggregate)
Review Comment:
Ok, I think I understand what is going on:
- in general the root aggregation expressions need not be rewritten/aliased
above since they can only appear once (i.e. they're not common), so this is ok
- however, some of their child expressions may get rewritten; an example is
the query `select distinct on (c1 + 1) c2 from test order by c1 + 1, c3`, where
the initial aggregation of the sole projected field gets rewritten from
`FIRST_VALUE(test.column2) ORDER BY [test.column1 + Int64(1) ASC NULLS LAST,
test.column3 ASC NULLS LAST]` to `FIRST_VALUE(test.column2) ORDER BY
[test.column1 + Int64(1)Int64(1)test.column1 AS test.column1 + Int64(1) ASC
NULLS LAST, test.column3 ASC NULLS LAST]` (i.e. `c1 + 1` is a common expression
of it and the `ON` expression so it gets rewritten)
- consequently the new schema will be different from the original one
I think this explains the behavior I've encountered here, and that this
behavior is not anomalous. The only question is whether there is a better way
to align the new aggregation schema with the old one besides aliasing as above.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]