amoeba opened a new issue, #39311: URL: https://github.com/apache/arrow/issues/39311
### Describe the enhancement requested I don't think the following is possible with any combination of the current set of ExecNodes and I'm curious (1) if there's interest in such functionality and (2) how it could be done. Something very close can currently be constructed using an AggregateNode with a UDF but even with that I haven't found a way to express this type of computation generally. Assuming you have a table like this a variable number of attributes per combination of date and company: | date | company | attributeA | ... | attribute N | |------|---------|------------|-----|-------------| | 1 | A | 1 | ... | ... | | 2 | B | 2 | ... | ... | | 3 | C | 3 | ... | ... | | 1 | A | 4 | ... | ... | | 2 | B | 5 | ... | ... | | 3 | C | 6 | ... | ... | | 1 | A | 7 | ... | ... | | 2 | B | 8 | ... | ... | | 3 | C | 9 | ... | ... | and you wanted to be able create a plan to group by date and filter the company or companies according to some arbitrary computation on values in any column or columns in attributeA...attributeN and you want everything as a list. For example, imagine attributeA above is employee head count and you want to find the companies top 1/3 percentile in terms of head count by day. You'd want this table as a result: | date | company | attributeA | ... | attribute N | |------|---------|------------|-------|-------------| | 1 | [A] | [3] | [...] | [...] | | 2 | [B] | [3] | [...] | [...] | | 3 | [C] | [3] | [...] | [...] | A key part of the result is that company and all attribute columns would be lists that contain arrays (1) all of the same length and (2) the arrays can be length 0 to G where G is the size of each date grouping. ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
