alamb opened a new issue, #15927: URL: https://github.com/apache/datafusion/issues/15927
### Describe the bug https://github.com/apache/datafusion/tree/main/benchmarks/queries/clickbench#extended-queries > The "extended" queries are not part of the official ClickBench benchmark. Instead they are used to test other DataFusion features that are not covered by the standard benchmark. Recently I tried to run Q5 for benchmarking and I got an error: > Error during planning: WITHIN GROUP clause is required when calling ordered set aggregate function(approx_percentile_cont) ### To Reproduce Run this query in datafusion-cli ```sql SELECT "ClientIP", "WatchID", COUNT(*) c, MIN("ResponseStartTiming") tmin, APPROX_PERCENTILE_CONT("ResponseStartTiming", 0.95) tp95, MAX("ResponseStartTiming") tmax FROM 'hits.parquet' WHERE "JavaEnable" = 0 -- filters to 32M of 100M rows GROUP BY "ClientIP", "WatchID" HAVING c > 1 ORDER BY tp95 DESC LIMIT 10; ``` For example: ```sql (venv) andrewlamb@Andrews-MacBook-Pro-2:~/Downloads$ ~/Software/datafusion/target/debug/datafusion-cli DataFusion CLI v47.0.0 > SELECT "ClientIP", "WatchID", COUNT(*) c, MIN("ResponseStartTiming") tmin, APPROX_PERCENTILE_CONT("ResponseStartTiming", 0.95) tp95, MAX("ResponseStartTiming") tmax FROM 'hits.parquet' WHERE "JavaEnable" = 0 -- filters to 32M of 100M rows GROUP BY "ClientIP", "WatchID" HAVING c > 1 ORDER BY tp95 DESC LIMIT 10; Error during planning: WITHIN GROUP clause is required when calling ordered set aggregate function(approx_percentile_cont) ``` ### Expected behavior DF 47 runs the query ``` (venv) andrewlamb@Andrews-MacBook-Pro-2:~/Downloads$ ~/Software/datafusion-cli/datafusion-cli-47.0.0 DataFusion CLI v47.0.0 > SELECT "ClientIP", "WatchID", COUNT(*) c, MIN("ResponseStartTiming") tmin, APPROX_PERCENTILE_CONT("ResponseStartTiming", 0.95) tp95, MAX("ResponseStartTiming") tmax FROM 'hits.parquet' WHERE "JavaEnable" = 0 -- filters to 32M of 100M rows GROUP BY "ClientIP", "WatchID" HAVING c > 1 ORDER BY tp95 DESC LIMIT 10; +-------------+---------------------+---+------+------+------+ | ClientIP | WatchID | c | tmin | tp95 | tmax | +-------------+---------------------+---+------+------+------+ | 1611957945 | 6655575552203051303 | 2 | 0 | 0 | 0 | | -1402644643 | 8566928176839891583 | 2 | 0 | 0 | 0 | +-------------+---------------------+---+------+------+------+ 2 row(s) fetched. Elapsed 5.360 seconds. ``` ### Additional context However, it seems like postgres also rejects such queries without WIHINGROUP: See this dbfiddle: https://www.db-fiddle.com/f/5dwiFr16TvBF8zF6f2TmSz/0  So I believe the right fix is to update the benchmark query -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org