[ 
https://issues.apache.org/jira/browse/ARROW-10945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb closed ARROW-10945.
-------------------------------
    Resolution: Duplicate

Moved to https://github.com/apache/arrow-datafusion/issues/600

> [Rust] [DataFusion] Allow User Defined Aggregates to return multiple values / 
> structs
> -------------------------------------------------------------------------------------
>
>                 Key: ARROW-10945
>                 URL: https://issues.apache.org/jira/browse/ARROW-10945
>             Project: Apache Arrow
>          Issue Type: New Feature
>            Reporter: Andrew Lamb
>            Priority: Major
>
> Usecase:
> I want to implement a user defined aggregate function that produces more than 
> one column ( logical values)
> Specifically I am trying to implement the InfluxDB 'selector' functions 
> `first`, `last`, `min`, and `max` as DataFusion aggregate functions.
> I can't use the built in aggregate functions in DataFusion as selector 
> functions aren't exactly like normal aggregate functions -- they return both 
> the actual aggregate value as well as a timestamp. In addition, `first` and 
> `last` pick a row in the value column based on the value in the timestamp 
> column.
> After some investigation, I realize I can't elegantly use the built in user 
> defined aggregate framework in DataFusion either. As an example of what is 
> going on here, let's take
> ```
> value | time
> ------+------
>   3   | 1000
>   2   | 2000
>   1   | 3000
> ```
> The result of `last(value)` should be be two columns `1 | 3000` -- however, 
> modeling this as a DataFusion aggregate does not seem to be possible at this 
> time.  Each aggregate function can return a single columnar value but we need 
> to return 2 (the `.value` and `.time` fields).
> Ideally I was thinking that the UDF could produce a Struct (with named field 
> `value` and `time`) but the evaluate 
> function([code])(https://github.com/apache/arrow/blob/master/rust/datafusion/src/physical_plan/mod.rs#L238))returns
>  a `ScalarValue` and at the moment they [don't have support for 
> Structs](https://github.com/apache/arrow/blob/master/rust/datafusion/src/scalar.rs#L44)
> I suspect that we would also need to add support in DataFusion for selecting 
> fields from structs
> See additional detail and context on 
> https://github.com/influxdata/influxdb_iox/issues/448#issuecomment-744601824



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to