Re: [PR] docs: refine `AggregateUDFImpl::is_ordered_set_aggregate` documentation [datafusion]

via GitHub Fri, 17 Oct 2025 01:19:30 -0700


Jefffrey commented on PR #17805:
URL: https://github.com/apache/datafusion/pull/17805#issuecomment-3413977907


   What I propose:
   
   - We should keep `WITHIN GROUP` optional, since we had users using 
`approx_percentile_cont` without it (so follow DuckDB here). For other ordered 
set aggregate functions we follow suit (already did so for `percentile_cont` 
here #17988)
     - If `WITHIN GROUP` not specified then it is implementation dependent what 
the default order is, though we should say it is ascending (we can't control 
this yet via code, only comments)
   - Make `WITHIN GROUP` strictly apply only to aggregate functions which 
return `true` for `AggregateUDFImpl::is_ordered_set_aggregate()` -> #18109
   - `AggregateUDFImpl::is_ordered_set_aggregate()` itself doesn't apply any 
guarantees to input order, like postgres; we won't insert a sort or anything, 
it'll be up to the aggregate function itself to take the "hint" from `WITHIN 
GROUP` (not really a hint but you get the point) and do its ordering internally 
accordingly
   - Refactor DataFrame APIs a bit to make it easier to use, for example:
   
   ```rust
   // Currently
   pub fn approx_percentile_cont(
       order_by: Sort,
       percentile: Expr,
       centroids: Option<Expr>,
   ) -> Expr {
       todo!()
   }
   
   // Proposed: much more explicit instead of needing to extract `expression` 
from `Sort`
   pub fn approx_percentile_cont(
       expression: Expr,
       percentile: Expr,
       ascending: bool,
       centroids: Option<Expr>,
   ) -> Expr {
       todo!()
   }
   ```
   
   Thoughts @alamb ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] docs: refine `AggregateUDFImpl::is_ordered_set_aggregate` documentation [datafusion]

Reply via email to