alamb opened a new issue #600:
URL: https://github.com/apache/arrow-datafusion/issues/600


   # Usecase
   I want to implement a user defined aggregate function that produces more 
than one column ( logical values)
   
   Specifically I am trying to implement the InfluxDB 'selector' functions 
`first`, `last`, `min`, and `max` as DataFusion aggregate functions.
   
   I can't use the built in aggregate functions in DataFusion as selector 
functions aren't exactly like normal aggregate functions – they return both the 
actual aggregate value as well as a timestamp. In addition, `first` and `last` 
pick a row in the value column based on the value in the timestamp column.
   
   After some investigation, I realize I can't elegantly use the built in user 
defined aggregate framework in DataFusion either. As an example of what is 
going on here, let's take
   
   ```
   value | time
   -----+-----
   3 | 1000
   2 | 2000
   1 | 3000
   ```
   
   The result of `last(value)` should be be two columns `1 | 3000` – however, 
modeling this as a DataFusion aggregate does not seem to be possible at this 
time. Each aggregate function can return a single columnar value but we need to 
return 2 (the `.value` and `.time` fields).
   
   
   See additional detail and context on 
https://github.com/influxdata/influxdb_iox/issues/448#issuecomment-744601824
   **Describe the solution you'd like**
   Ideally I was thinking that the UDF could produce a Struct (with named field 
`value` and `time`) but the evaluate 
function([code](https://github.com/apache/arrow/blob/master/rust/datafusion/src/physical_plan/mod.rs#L238)
 returns a `ScalarValue` and at the moment they [don't have support for 
Structs](https://github.com/apache/arrow/blob/master/rust/datafusion/src/scalar.rs#L44)
   
   I suspect that we would also need to add support in DataFusion for selecting 
fields from structs
   
   **Additional context**
   Ported from original JIRA: https://issues.apache.org/jira/browse/ARROW-10945
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to