[
https://issues.apache.org/jira/browse/ARROW-10945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Lamb closed ARROW-10945.
-------------------------------
Resolution: Duplicate
Moved to https://github.com/apache/arrow-datafusion/issues/600
> [Rust] [DataFusion] Allow User Defined Aggregates to return multiple values /
> structs
> -------------------------------------------------------------------------------------
>
> Key: ARROW-10945
> URL: https://issues.apache.org/jira/browse/ARROW-10945
> Project: Apache Arrow
> Issue Type: New Feature
> Reporter: Andrew Lamb
> Priority: Major
>
> Usecase:
> I want to implement a user defined aggregate function that produces more than
> one column ( logical values)
> Specifically I am trying to implement the InfluxDB 'selector' functions
> `first`, `last`, `min`, and `max` as DataFusion aggregate functions.
> I can't use the built in aggregate functions in DataFusion as selector
> functions aren't exactly like normal aggregate functions -- they return both
> the actual aggregate value as well as a timestamp. In addition, `first` and
> `last` pick a row in the value column based on the value in the timestamp
> column.
> After some investigation, I realize I can't elegantly use the built in user
> defined aggregate framework in DataFusion either. As an example of what is
> going on here, let's take
> ```
> value | time
> ------+------
> 3 | 1000
> 2 | 2000
> 1 | 3000
> ```
> The result of `last(value)` should be be two columns `1 | 3000` -- however,
> modeling this as a DataFusion aggregate does not seem to be possible at this
> time. Each aggregate function can return a single columnar value but we need
> to return 2 (the `.value` and `.time` fields).
> Ideally I was thinking that the UDF could produce a Struct (with named field
> `value` and `time`) but the evaluate
> function([code])(https://github.com/apache/arrow/blob/master/rust/datafusion/src/physical_plan/mod.rs#L238))returns
> a `ScalarValue` and at the moment they [don't have support for
> Structs](https://github.com/apache/arrow/blob/master/rust/datafusion/src/scalar.rs#L44)
> I suspect that we would also need to add support in DataFusion for selecting
> fields from structs
> See additional detail and context on
> https://github.com/influxdata/influxdb_iox/issues/448#issuecomment-744601824
--
This message was sent by Atlassian Jira
(v8.3.4#803005)