jayzhan211 commented on code in PR #11371:
URL: https://github.com/apache/datafusion/pull/11371#discussion_r1677243837
##########
datafusion/expr/src/planner.rs:
##########
@@ -161,6 +162,28 @@ pub trait ExprPlanner: Send + Sync {
) -> Result<PlannerResult<Vec<Expr>>> {
Ok(PlannerResult::Original(args))
}
+
+ /// Plans a `RawAggregateUDF` based on the given input expressions.
+ ///
+ /// Returns a `PlannerResult` containing either the planned aggregate
function or the original
+ /// input expressions if planning is not possible.
+ fn plan_aggregate_udf(
+ &self,
+ aggregate_function: RawAggregateUDF,
+ ) -> Result<PlannerResult<RawAggregateUDF>> {
+ Ok(PlannerResult::Original(aggregate_function))
Review Comment:
> I guess in my mind the distinction is
the planner is a mechanical transformation to something that could be run
the optimizer rewrites expressions / plans to keep the exact same semantics
(output results) but faster.
I agree with this. Rewrite in logical optimizer results to the better one.
> So in my mind the transformation of count(1) --> count() is an
optimization as it potentially avoids having to accumulate a known column in
the execution. Maybe there is a better way to think about it
In the case of count, `count(*)`, `count(1)`, and `count()` are equivalent
things, not one is faster than the other.
The optimization that counts the row number is not in the logical optimizer
but in the physical optimizer. Therefore, the transformation from `count()` and
`count(*)` to `count(1)` is not close to the second point IMO. What we are
doing before the physical optimizer is to transform the equivalent expr to a
single one (`count(1)`), so we just need to process that one in the physical
optimizer. Note that we could process those 3 expressions individually in the
physical optimizer, but standardizing to a single expression as early as
possible might reduce the complexity of the downstream query processing. *It is
more like rewriting expressions to the equivalent expression* to me.
> This might be a good distinction to clarify in the documentation 🤔
It is nice to do so, we should clarify the role between these two.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]