alamb opened a new issue, #8984:
URL: https://github.com/apache/arrow-datafusion/issues/8984

   ### Is your feature request related to a problem or challenge?
   
   Some built in aggregates (such as `FIRST_VALUE`, `LAST_VALUE` and 
`ARRAY_AGG`) support an optional ORDER BY argument that defines the order they 
see their input. For example:
   
   ```sql
   ❯ create table foo(x int, y int) as values (1, 100),(2, 100),(0, 200);
   0 rows in set. Query took 0.003 seconds.
   
   -- note the `ORDER BY x` in the argument to `FIRST_VALUE`
   ❯ select FIRST_VALUE(x ORDER BY x) from foo GROUP BY y;
   +--------------------+
   | FIRST_VALUE(foo.x) |
   +--------------------+
   | 1                  |
   | 0                  |
   +--------------------+
   2 rows in set. Query took 0.008 seconds.
   ```
   
   This is not supported today in user defined aggregates
   
   ### Describe the solution you'd like
   
   I would like to be be able to create a user defined aggregate that can 
specify its input order.
   
   This would roughly require:
   1. Extending the [`AggregateUDFImpl` 
trait](https://github.com/apache/arrow-datafusion/blob/edec4189242ab07ac65967490537d77e776aad5c/datafusion/expr/src/udaf.rs#L242)
 to communicate the ordering somehow . 
   2. Updating the implementation of 
https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.AggregateExpr.html#method.order_bys
   3. writing an end to end test in 
https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_aggregates.rs
 showing it all working
   
   Here are some other places that likely need to changed
   
https://github.com/apache/arrow-datafusion/blob/b5db7187763bc4511aaffdd6d89b2f0908f17938/datafusion/core/src/physical_planner.rs#L242-L252
   
   
https://github.com/apache/arrow-datafusion/blob/b5db7187763bc4511aaffdd6d89b2f0908f17938/datafusion/core/src/physical_planner.rs#L1663-L1690
   
   Maybe looking at how `OrderSensitiveArrayAgg` is implemented can help 
https://github.com/apache/arrow-datafusion/blob/5d70c32a9a4accf21e9f27ff5ed62666cbbcbe54/datafusion/physical-expr/src/aggregate/array_agg_ordered.rs#L45
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to