alamb commented on PR #17805:
URL: https://github.com/apache/datafusion/pull/17805#issuecomment-3348109255

   > This was then amended in https://github.com/apache/datafusion/pull/16999 
to make it optional, at least via the SQL API; it is still mandatory on the 
DataFrame API:
   
   In my mind this is mostly for backwards compatibility reasons -- 
https://github.com/apache/datafusion/pull/13511 basically broke a bunch of our 
existing user queries, so I wanted to revert the unnecessarily strict 
interpretation
   
   > As I understand it, https://github.com/apache/datafusion/pull/13511 made 
WITHIN GROUP mandatory for ordered set aggregate functions, of which we support 
only two so far:
   
   Indeed -- and both of these functions have the property that many times  
their argument will be the same as the `ORDER BY WITHIN GROUP`-- for example, 
computing `approx_median(x)` implicitly means `approx_median(x ORDER BY x 
WITHIN GROUP)`
   
   Though allowing different arguments means you can write expressions like 
`approx_median(first_name ORDER BY salary WITHIN GROUP)` and save yourself a 
subquery 
   
   > A question I have is if we should loosen the DataFrame API to allow 
omitting the sort, as #16999 did for the SQL API?
   > 
   > cc @alamb
   
   I suggest we hold off unless someone explicitly asks about it, though I am 
not opposed to it either
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to