alamb opened a new issue, #8984: URL: https://github.com/apache/arrow-datafusion/issues/8984
### Is your feature request related to a problem or challenge? Some built in aggregates (such as `FIRST_VALUE`, `LAST_VALUE` and `ARRAY_AGG`) support an optional ORDER BY argument that defines the order they see their input. For example: ```sql ❯ create table foo(x int, y int) as values (1, 100),(2, 100),(0, 200); 0 rows in set. Query took 0.003 seconds. -- note the `ORDER BY x` in the argument to `FIRST_VALUE` ❯ select FIRST_VALUE(x ORDER BY x) from foo GROUP BY y; +--------------------+ | FIRST_VALUE(foo.x) | +--------------------+ | 1 | | 0 | +--------------------+ 2 rows in set. Query took 0.008 seconds. ``` This is not supported today in user defined aggregates ### Describe the solution you'd like I would like to be be able to create a user defined aggregate that can specify its input order. This would roughly require: 1. Extending the [`AggregateUDFImpl` trait](https://github.com/apache/arrow-datafusion/blob/edec4189242ab07ac65967490537d77e776aad5c/datafusion/expr/src/udaf.rs#L242) to communicate the ordering somehow . 2. Updating the implementation of https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.AggregateExpr.html#method.order_bys 3. writing an end to end test in https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_aggregates.rs showing it all working Here are some other places that likely need to changed https://github.com/apache/arrow-datafusion/blob/b5db7187763bc4511aaffdd6d89b2f0908f17938/datafusion/core/src/physical_planner.rs#L242-L252 https://github.com/apache/arrow-datafusion/blob/b5db7187763bc4511aaffdd6d89b2f0908f17938/datafusion/core/src/physical_planner.rs#L1663-L1690 Maybe looking at how `OrderSensitiveArrayAgg` is implemented can help https://github.com/apache/arrow-datafusion/blob/5d70c32a9a4accf21e9f27ff5ed62666cbbcbe54/datafusion/physical-expr/src/aggregate/array_agg_ordered.rs#L45 ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
