timsaucer opened a new issue, #15882:
URL: https://github.com/apache/datafusion/issues/15882

   ### Is your feature request related to a problem or challenge?
   
   Suppose I have an DataFrame in which one column contains arrays. I wish to 
be able to apply any scalar expr to each value of that array and return an 
array out. For example I would like to be able to apply an `abs()` function and 
convert data such as this:
   
   ```
   DataFrame()
   +--------------+-------------+
   | a            | abs(a)      |
   +--------------+-------------+
   | [-10, 5, 13] | [10, 5, 13] |
   | [2]          | [2]         |
   | [-3, 1]      | [3, 1]      |
   +--------------+-------------+
   ```
   
   Additionally it would be amazing to be able to apply any aggregate function 
to an array element.
   
   ```
   DataFrame()
   +--------------+--------+
   | a            | sum(a) |
   +--------------+--------+
   | [-10, 5, 13] | 8      |
   | [2]          | 2      |
   | [-3, 1]      | 2      |
   +--------------+--------+
   ```
   
   ### Describe the solution you'd like
   
   This is similar to the spark `transform` operation. It is very powerful for 
highly structured data. I don't know the best form that that functions would 
take, but it would be even more powerful if we could do element-by-element 
operations across more than one column in the dataframe. There are many use 
cases where you will have columns of array elements of the same length.
   
   ### Describe alternatives you've considered
   
   The current status quo is to either write a UDF to handle these on a case by 
case basis or to do an unnest and group by. The unnest and group by can be an 
expensive operation.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to