alamb opened a new issue #600: URL: https://github.com/apache/arrow-datafusion/issues/600
# Usecase I want to implement a user defined aggregate function that produces more than one column ( logical values) Specifically I am trying to implement the InfluxDB 'selector' functions `first`, `last`, `min`, and `max` as DataFusion aggregate functions. I can't use the built in aggregate functions in DataFusion as selector functions aren't exactly like normal aggregate functions – they return both the actual aggregate value as well as a timestamp. In addition, `first` and `last` pick a row in the value column based on the value in the timestamp column. After some investigation, I realize I can't elegantly use the built in user defined aggregate framework in DataFusion either. As an example of what is going on here, let's take ``` value | time -----+----- 3 | 1000 2 | 2000 1 | 3000 ``` The result of `last(value)` should be be two columns `1 | 3000` – however, modeling this as a DataFusion aggregate does not seem to be possible at this time. Each aggregate function can return a single columnar value but we need to return 2 (the `.value` and `.time` fields). See additional detail and context on https://github.com/influxdata/influxdb_iox/issues/448#issuecomment-744601824 **Describe the solution you'd like** Ideally I was thinking that the UDF could produce a Struct (with named field `value` and `time`) but the evaluate function([code](https://github.com/apache/arrow/blob/master/rust/datafusion/src/physical_plan/mod.rs#L238) returns a `ScalarValue` and at the moment they [don't have support for Structs](https://github.com/apache/arrow/blob/master/rust/datafusion/src/scalar.rs#L44) I suspect that we would also need to add support in DataFusion for selecting fields from structs **Additional context** Ported from original JIRA: https://issues.apache.org/jira/browse/ARROW-10945 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
