Re: [PR] feat: add raw aggregate udf planner [datafusion]

via GitHub Fri, 12 Jul 2024 18:00:47 -0700


jayzhan211 commented on code in PR #11371:
URL: https://github.com/apache/datafusion/pull/11371#discussion_r1676601717



##########
datafusion/expr/src/planner.rs:
##########
@@ -161,6 +162,28 @@ pub trait ExprPlanner: Send + Sync {
     ) -> Result<PlannerResult<Vec<Expr>>> {
         Ok(PlannerResult::Original(args))
     }
+
+    /// Plans a `RawAggregateUDF` based on the given input expressions.
+    ///
+    /// Returns a `PlannerResult` containing either the planned aggregate 
function or the original
+    /// input expressions if planning is not possible.
+    fn plan_aggregate_udf(
+        &self,
+        aggregate_function: RawAggregateUDF,
+    ) -> Result<PlannerResult<RawAggregateUDF>> {
+        Ok(PlannerResult::Original(aggregate_function))

Review Comment:
   The rewrite could be simple enough that we don't need to push down to 
optimizer.
   
   ```rust
   pub fn count_star() -> Expr {
       
Expr::AggregateFunction(datafusion_expr::expr::AggregateFunction::new_udf(
           count_udaf(),
           vec![Expr::Literal(COUNT_STAR_EXPANSION)],
           false,
           None,
           None,
           None,
       ))
   }
   
   pub fn count(expr: Expr) -> Expr {
       // For backward compatility, could use count_star instead
       if let Expr::Wildcard { qualifier: _ } = expr {
           count_star()
       } else {
           
Expr::AggregateFunction(datafusion_expr::expr::AggregateFunction::new_udf(
               count_udaf(),
               vec![expr],
               false,
               None,
               None,
               None,
           ))
       }
   }
   ```
   
   I think we need a clear role between expression rewrite in planner vs 
rewrite in optimizer. 
   ExprPlanner is the first spot we get the expressions from SQLExpr and build 
up datafusion::Expr. For syntax rewrite like dict `{a: b}`, compound id `a.b` 
or equivalent args rewrite `count(*)` are all good candidate to determine the 
expected datafusion::Expr in ExprPlanner. Others should push down to optimizer
   
   If the reason to rewrite expression in optimizer is because that benefit 
both sql and dataframe API, maybe redesign dataframe API is what we need 🤔 ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: add raw aggregate udf planner [datafusion]

Reply via email to