amoeba opened a new issue, #39311:
URL: https://github.com/apache/arrow/issues/39311

   ### Describe the enhancement requested
   
   I don't think the following is possible with any combination of the current 
set of ExecNodes and I'm curious (1) if there's interest in such functionality 
and (2) how it could be done. Something very close can currently be constructed 
using an AggregateNode with a UDF but even with that I haven't found a way to 
express this type of computation generally.
   
   Assuming you have a table like this a variable number of attributes per 
combination of date and company:
   
   | date | company | attributeA | ... | attribute N |
   |------|---------|------------|-----|-------------|
   | 1    | A       | 1          | ... | ...         |
   | 2    | B       | 2          | ... | ...         |
   | 3    | C       | 3          | ... | ...         |
   | 1    | A       | 4          | ... | ...         |
   | 2    | B       | 5          | ... | ...         |
   | 3    | C       | 6          | ... | ...         |
   | 1    | A       | 7          | ... | ...         |
   | 2    | B       | 8          | ... | ...         |
   | 3    | C       | 9          | ... | ...         |
   
   and you wanted to be able create a plan to group by date and filter the 
company or companies according to some arbitrary computation on values in any 
column or columns in attributeA...attributeN and you want everything as a list. 
For example, imagine attributeA above is employee head count and you want to 
find the companies top 1/3 percentile in terms of head count by day. 
   
   You'd want this table as a result:
   
   | date | company | attributeA | ...   | attribute N |
   |------|---------|------------|-------|-------------|
   | 1    | [A]     | [3]        | [...] | [...]       |
   | 2    | [B]     | [3]        | [...] | [...]       |
   | 3    | [C]     | [3]        | [...] | [...]       |
   
   A key part of the result is that company and all attribute columns would be 
lists that contain arrays (1) all of the same length and (2) the arrays can be 
length 0 to G where G is the size of each date grouping.
   
   
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to