Re: [PR] feat: add raw aggregate udf planner [datafusion]

via GitHub Sun, 14 Jul 2024 08:25:08 -0700


alamb commented on code in PR #11371:
URL: https://github.com/apache/datafusion/pull/11371#discussion_r1677155973



##########
datafusion/expr/src/planner.rs:
##########
@@ -161,6 +162,28 @@ pub trait ExprPlanner: Send + Sync {
     ) -> Result<PlannerResult<Vec<Expr>>> {
         Ok(PlannerResult::Original(args))
     }
+
+    /// Plans a `RawAggregateUDF` based on the given input expressions.
+    ///
+    /// Returns a `PlannerResult` containing either the planned aggregate 
function or the original
+    /// input expressions if planning is not possible.
+    fn plan_aggregate_udf(
+        &self,
+        aggregate_function: RawAggregateUDF,
+    ) -> Result<PlannerResult<RawAggregateUDF>> {
+        Ok(PlannerResult::Original(aggregate_function))

Review Comment:
   >
   > I think we need a clear role between expression rewrite in planner vs 
rewrite in optimizer. ExprPlanner is the first spot we get the expressions from 
SQLExpr and build up datafusion::Expr. For syntax rewrite like dict `{a: b}`, 
compound id `a.b` or equivalent args rewrite `count(*)` are all good candidate 
to determine the expected datafusion::Expr in ExprPlanner. Others should push 
down to optimizer since they may need const evaluation or multiple passes.
   > 
   > If the reason to rewrite expression in optimizer is because that benefit 
both sql and dataframe API, maybe redesign dataframe API is what we need 🤔 ?
   
   I guess in my mind the distinction is 
   1.  the planner is a mechanical transformation to something that could be 
run 
   2. the optimizer rewrites expressions / plans to keep the exact same 
semantics (output results) but faster.
   
   So in my mind the transformation of `count(1)` --> `count()` is an 
optimization as it potentially avoids having to accumulate a known column in 
the execution. Maybe there is a better way to think about it
   
   This might be a good distinction to clarify in the documentation 🤔 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: add raw aggregate udf planner [datafusion]

Reply via email to