alamb opened a new issue, #16955:
URL: https://github.com/apache/datafusion/issues/16955

   ### Is your feature request related to a problem or challenge?
   
   We recently rolled out DataFusion 48.0.1 to production and had several 
customers report that queries that previously ran without issue
   
   
   
   We traced it back to this PR  from @Garamda  which updated DataFusion SQL to 
use the standard WITHIN GROUP syntax:
   -  https://github.com/apache/datafusion/pull/13511
   
   For example, given this data
   
   ```sql
   > create table test(c1 int, c2 float) as values (1, 100), (2, 200), (5, 
500), (4, 400), (3, 300), (2, 200);
   0 row(s) fetched.
   
   > select * from test;
   +----+-------+
   | c1 | c2    |
   +----+-------+
   | 1  | 100.0 |
   | 2  | 200.0 |
   | 5  | 500.0 |
   | 4  | 400.0 |
   | 3  | 300.0 |
   | 2  | 200.0 |
   +----+-------+
   6 row(s) fetched.
   ```
   
   ## DataFusion 47.0.0
   ```sql
   > SELECT approx_percentile_cont_with_weight(c1, c2, 0.95) FROM test;
   +-------------------------------------------------------------------+
   | approx_percentile_cont_with_weight(test.c1,test.c2,Float64(0.95)) |
   +-------------------------------------------------------------------+
   | 5.0                                                               |
   +-------------------------------------------------------------------+
   1 row(s) fetched.
   Elapsed 0.001 seconds.
   ```
   
   ## Same query errors DataFusion 48.0.0
   
   ```sql
   > SELECT approx_percentile_cont_with_weight(c1, c2, 0.95) FROM test;
   Error during planning: WITHIN GROUP clause is required when calling ordered 
set aggregate function(approx_percentile_cont_with_weight)
   ```
   
   Instead you must use a different syntax
   ```sql
   > SELECT approx_percentile_cont_with_weight(c2, 0.95) WITHIN GROUP (ORDER BY 
c1) FROM test;
   
+-------------------------------------------------------------------------------------------------+
   | approx_percentile_cont_with_weight(test.c2,Float64(0.95)) WITHIN GROUP 
[test.c1 ASC NULLS LAST] |
   
+-------------------------------------------------------------------------------------------------+
   | 5.0                                                                        
                     |
   
+-------------------------------------------------------------------------------------------------+
   1 row(s) fetched.
   Elapsed 0.003 seconds.
   ```
   
   
   The functions affected are:
   - approx_percentile_cont
     - 47.0.0 : `approx_percentile_cont(expression, percentile, centroids)`
     - 48.0.0 : `approx_percentile_cont(percentile, centroids) WITHIN GROUP 
(ORDER BY expression)`
   - approx_percentile_cont_with_weight
     - 47.0.0 : `approx_percentile_cont_with_weight(expression, weight, 
percentile)`
     - 48.0.0 : `approx_percentile_cont_with_weight(weight, percentile) WITHIN 
GROUP (ORDER BY expression)`
   
   
   # Other users have been affected too
   Looking at other issues linked to that PR, this change seems to have caused 
downstream breakage for others too:
   - Pydantic Logfire: https://github.com/pydantic/logfire/issues/1044
   - Clickbench extended queries: 
https://github.com/apache/datafusion/issues/15927
   
   
   
   ### Describe the solution you'd like
   
   I would like DataFusion to support both the new and old syntax for 
`approx_percentile_cont` and `approx_percentile_cont_with_weight`
   
   For example, given this table:
   ```shell
   > create table test(c1 int, c2 float) as values (1, 100), (2, 200), (5, 
500), (4, 400), (3, 300), (2, 200);
   
   
   For example, that means in addition to supporting the  new syntax from 
DataFusion 48.0.0:
   ```sql
   > SELECT approx_percentile_cont_with_weight(c1, c2, 0.95) FROM test;
   +-------------------------------------------------------------------+
   | approx_percentile_cont_with_weight(test.c1,test.c2,Float64(0.95)) |
   +-------------------------------------------------------------------+
   | 5.0                                                               |
   +-------------------------------------------------------------------+
   1 row(s) fetched.
   Elapsed 0.001 seconds.
   ```
   
   A user should also be able to write this (as they were able in DataFusion 47)
   ```sql
   > SELECT approx_percentile_cont_with_weight(c1, c2, 0.95) FROM test;
   +-------------------------------------------------------------------+
   | approx_percentile_cont_with_weight(test.c1,test.c2,Float64(0.95)) |
   +-------------------------------------------------------------------+
   | 5.0                                                               |
   +-------------------------------------------------------------------+
   1 row(s) fetched.
   ```
   
   And get the same answer
   
   
   ### Describe alternatives you've considered
   
   We could only make this change on our own fork
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to