Jefffrey commented on code in PR #18607:
URL: https://github.com/apache/datafusion/pull/18607#discussion_r2522785581


##########
docs/source/library-user-guide/upgrading.md:
##########
@@ -360,6 +360,27 @@ The accompanying `AggregateUDF::is_ordered_set_aggregate` 
has also been renamed
 No functionality has been changed with regards to this method; it still refers 
only to permitting use of `WITHIN GROUP`
 SQL syntax for the aggregate function.
 
+### Planner now requires explicit opt-in for WITHIN GROUP syntax

Review Comment:
   Need to move this to v52 section now



##########
docs/source/user-guide/sql/aggregate_functions.md:
##########
@@ -48,6 +48,28 @@ FROM employees;
 
 Note: When no rows pass the filter, `COUNT` returns `0` while 
`SUM`/`AVG`/`MIN`/`MAX` return `NULL`.
 
+## WITHIN GROUP / Ordered-set aggregates
+
+Some aggregate functions support the SQL `WITHIN GROUP (ORDER BY ...)` clause. 
This clause is used by
+ordered-set aggregate functions (for example, percentile and rank-like 
aggregations) to specify the ordering
+of inputs that the aggregate relies on. In DataFusion, only aggregate 
functions that explicitly opt into
+ordered-set semantics via their implementation will accept `WITHIN GROUP`; 
attempting to use `WITHIN GROUP`
+with a regular aggregate (for example `SUM(x) WITHIN GROUP (ORDER BY x)`) will 
fail during planning with an
+error. This matches Postgres semantics and ensures ordered-set behavior is 
opt-in for user-defined aggregates.
+
+Example (ordered-set aggregate):
+
+```sql
+percentile_cont(0.5) WITHIN GROUP (ORDER BY value)
+```
+
+Example (invalid usage — planner will error):
+
+```sql
+-- This will fail: SUM is not an ordered-set aggregate
+SELECT SUM(x) WITHIN GROUP (ORDER BY x) FROM t;
+```

Review Comment:
   The problem here is it is not an explicit list, and refers to rank as an 
ordered-set aggregation when it isn't in DataFusion. My concern here is that 
it'll inform the users that ordered-set aggregate functions _exist_, but isn't 
exactly clear which ones we have.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to