Jefffrey commented on PR #17805:
URL: https://github.com/apache/datafusion/pull/17805#issuecomment-3413977907
What I propose:
- We should keep `WITHIN GROUP` optional, since we had users using
`approx_percentile_cont` without it (so follow DuckDB here). For other ordered
set aggregate functions we follow suit (already did so for `percentile_cont`
here #17988)
- If `WITHIN GROUP` not specified then it is implementation dependent what
the default order is, though we should say it is ascending (we can't control
this yet via code, only comments)
- Make `WITHIN GROUP` strictly apply only to aggregate functions which
return `true` for `AggregateUDFImpl::is_ordered_set_aggregate()` -> #18109
- `AggregateUDFImpl::is_ordered_set_aggregate()` itself doesn't apply any
guarantees to input order, like postgres; we won't insert a sort or anything,
it'll be up to the aggregate function itself to take the "hint" from `WITHIN
GROUP` (not really a hint but you get the point) and do its ordering internally
accordingly
- Refactor DataFrame APIs a bit to make it easier to use, for example:
```rust
// Currently
pub fn approx_percentile_cont(
order_by: Sort,
percentile: Expr,
centroids: Option<Expr>,
) -> Expr {
todo!()
}
// Proposed: much more explicit instead of needing to extract `expression`
from `Sort`
pub fn approx_percentile_cont(
expression: Expr,
percentile: Expr,
ascending: bool,
centroids: Option<Expr>,
) -> Expr {
todo!()
}
```
Thoughts @alamb ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]