Jefffrey commented on code in PR #18607: URL: https://github.com/apache/datafusion/pull/18607#discussion_r2522785581
########## docs/source/library-user-guide/upgrading.md: ########## @@ -360,6 +360,27 @@ The accompanying `AggregateUDF::is_ordered_set_aggregate` has also been renamed No functionality has been changed with regards to this method; it still refers only to permitting use of `WITHIN GROUP` SQL syntax for the aggregate function. +### Planner now requires explicit opt-in for WITHIN GROUP syntax Review Comment: Need to move this to v52 section now ########## docs/source/user-guide/sql/aggregate_functions.md: ########## @@ -48,6 +48,28 @@ FROM employees; Note: When no rows pass the filter, `COUNT` returns `0` while `SUM`/`AVG`/`MIN`/`MAX` return `NULL`. +## WITHIN GROUP / Ordered-set aggregates + +Some aggregate functions support the SQL `WITHIN GROUP (ORDER BY ...)` clause. This clause is used by +ordered-set aggregate functions (for example, percentile and rank-like aggregations) to specify the ordering +of inputs that the aggregate relies on. In DataFusion, only aggregate functions that explicitly opt into +ordered-set semantics via their implementation will accept `WITHIN GROUP`; attempting to use `WITHIN GROUP` +with a regular aggregate (for example `SUM(x) WITHIN GROUP (ORDER BY x)`) will fail during planning with an +error. This matches Postgres semantics and ensures ordered-set behavior is opt-in for user-defined aggregates. + +Example (ordered-set aggregate): + +```sql +percentile_cont(0.5) WITHIN GROUP (ORDER BY value) +``` + +Example (invalid usage — planner will error): + +```sql +-- This will fail: SUM is not an ordered-set aggregate +SELECT SUM(x) WITHIN GROUP (ORDER BY x) FROM t; +``` Review Comment: The problem here is it is not an explicit list, and refers to rank as an ordered-set aggregation when it isn't in DataFusion. My concern here is that it'll inform the users that ordered-set aggregate functions _exist_, but isn't exactly clear which ones we have. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
