Andrew Lamb created ARROW-10945:
-----------------------------------

             Summary: [Rust] [DataFusion] Allow User Defined Aggregates to 
return multiple values / structs
                 Key: ARROW-10945
                 URL: https://issues.apache.org/jira/browse/ARROW-10945
             Project: Apache Arrow
          Issue Type: New Feature
            Reporter: Andrew Lamb



Usecase:
I want to implement a user defined aggregate function that produces more than 
one column ( logical values)

Specifically I am trying to implement the InfluxDB 'selector' functions 
`first`, `last`, `min`, and `max` as DataFusion aggregate functions.

I can't use the built in aggregate functions in DataFusion as selector 
functions aren't exactly like normal aggregate functions -- they return both 
the actual aggregate value as well as a timestamp. In addition, `first` and 
`last` pick a row in the value column based on the value in the timestamp 
column.

After some investigation, I realize I can't elegantly use the built in user 
defined aggregate framework in DataFusion either. As an example of what is 
going on here, let's take

```
value | time
------+------
  3   | 1000
  2   | 2000
  1   | 3000
```

The result of `last(value)` should be be two columns `1 | 3000` -- however, 
modeling this as a DataFusion aggregate does not seem to be possible at this 
time.  Each aggregate function can return a single columnar value but we need 
to return 2 (the `.value` and `.time` fields).

Ideally I was thinking that the UDF could produce a Struct (with named field 
`value` and `time`) but the evaluate 
function([code])(https://github.com/apache/arrow/blob/master/rust/datafusion/src/physical_plan/mod.rs#L238))returns
 a `ScalarValue` and at the moment they [don't have support for 
Structs](https://github.com/apache/arrow/blob/master/rust/datafusion/src/scalar.rs#L44)

I suspect that we would also need to add support in DataFusion for selecting 
fields from structs

See additional detail and context on 
https://github.com/influxdata/influxdb_iox/issues/448#issuecomment-744601824




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to