alamb commented on PR #17805: URL: https://github.com/apache/datafusion/pull/17805#issuecomment-3348109255
> This was then amended in https://github.com/apache/datafusion/pull/16999 to make it optional, at least via the SQL API; it is still mandatory on the DataFrame API: In my mind this is mostly for backwards compatibility reasons -- https://github.com/apache/datafusion/pull/13511 basically broke a bunch of our existing user queries, so I wanted to revert the unnecessarily strict interpretation > As I understand it, https://github.com/apache/datafusion/pull/13511 made WITHIN GROUP mandatory for ordered set aggregate functions, of which we support only two so far: Indeed -- and both of these functions have the property that many times their argument will be the same as the `ORDER BY WITHIN GROUP`-- for example, computing `approx_median(x)` implicitly means `approx_median(x ORDER BY x WITHIN GROUP)` Though allowing different arguments means you can write expressions like `approx_median(first_name ORDER BY salary WITHIN GROUP)` and save yourself a subquery > A question I have is if we should loosen the DataFrame API to allow omitting the sort, as #16999 did for the SQL API? > > cc @alamb I suggest we hold off unless someone explicitly asks about it, though I am not opposed to it either -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
