jayzhan211 commented on code in PR #11371:
URL: https://github.com/apache/datafusion/pull/11371#discussion_r1677243837


##########
datafusion/expr/src/planner.rs:
##########
@@ -161,6 +162,28 @@ pub trait ExprPlanner: Send + Sync {
     ) -> Result<PlannerResult<Vec<Expr>>> {
         Ok(PlannerResult::Original(args))
     }
+
+    /// Plans a `RawAggregateUDF` based on the given input expressions.
+    ///
+    /// Returns a `PlannerResult` containing either the planned aggregate 
function or the original
+    /// input expressions if planning is not possible.
+    fn plan_aggregate_udf(
+        &self,
+        aggregate_function: RawAggregateUDF,
+    ) -> Result<PlannerResult<RawAggregateUDF>> {
+        Ok(PlannerResult::Original(aggregate_function))

Review Comment:
   > I guess in my mind the distinction is
   the planner is a mechanical transformation to something that could be run
   the optimizer rewrites expressions / plans to keep the exact same semantics 
(output results) but faster.
   
   I agree with this. Rewrite in logical optimizer results to the better one.
   
   > So in my mind the transformation of count(1) --> count() is an 
optimization as it potentially avoids having to accumulate a known column in 
the execution. Maybe there is a better way to think about it
   
   In the case of count, `count(*)`, `count(1)`, and `count()` are equivalent 
things, not one is faster than the other.
   
   The optimization that counts the row number is not in the logical optimizer 
but in the physical optimizer. Therefore, the transformation from `count()` and 
`count(*)` to `count(1)` is not close to the second point IMO. What we are 
doing before the physical optimizer is to transform the equivalent expr to a 
single one (`count(1)`), so we just need to process that one in the physical 
optimizer. Note that we could process those 3 expressions individually in the 
physical optimizer, but standardizing to a single expression as early as 
possible might reduce the complexity of the downstream query processing. *It is 
more like rewriting expressions to the equivalent expression* to me.
   
   > This might be a good distinction to clarify in the documentation 🤔
   
   It is nice to do so, we should clarify the role between these two.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to