alamb commented on issue #17446:
URL: https://github.com/apache/datafusion/issues/17446#issuecomment-3356220342

   > Slicing row of a struct arrays can be expensive, i wonder how polar store 
the data of this column during their execution, or do they consider the column 
as a struct type at all
   
   What I would suggest we do initially in DataFusion is:
   1. Retain references to all input ArrayRefs and the corresponding 
group_indexes on aggregation phase
   2. On output phase, form a new underlying values array by copying the 
relevant input rows in order (using the [`interleave` 
](https://docs.rs/arrow/latest/arrow/compute/kernels/interleave/index.html)kernel)
   
   That approach requires no slicing and will work for any arbitrary array type
   
   This design will require 2x the input size at peak (for the input and the 
inprogress output) but I think is probably significantly better than using the 
Accumulators (which also holds each input array entirely, but also has a bunch 
of individual allocations per row to store the ScalarValues
   
   
   Here is a ascii art idea:
   
   GroupsAccumulator State
   ```
      ┌────────┐     ┌─────┐                             
      │   A    │     │  0  │                             
      ├────────┤     ├─────┤                             
      │   B    │     │  1  │                             
      ├────────┤     ├─────┤                             
      │   C    │     │  0  │                             
      ├────────┤     ├─────┤                             
      │   D    │     │  2  │                             
      ├────────┤     ├─────┤               State         
      │   E    │     │  3  │                             
      └────────┘     └─────┘       Accumulator keeps all 
                                    input `ArrayRef` and 
                                     the corresponding   
      ┌────────┐     ┌─────┐       group_indexes for each
      │   V    │     │  0  │                row          
      ├────────┤     ├─────┤                             
      │   W    │     │  0  │                             
      ├────────┤     ├─────┤                             
      │   X    │     │  4  │                             
      ├────────┤     ├─────┤                             
      │   Y    │     │  2  │                             
      ├────────┤     ├─────┤                             
      │   Z    │     │  4  │                             
      └────────┘     └─────┘                             
                                                         
                                                         
                                                         
        ArrayRefs     group                              
                     indices                             
   ```
   
   
   GroupsAccumulator output (emit):
   
   ```
                                                                 
                                              Output ListArray   
                                                                 
      ┌────────┐     ┌─────┐                                     
      │   A    │     │  0  │───                                  
      ├────────┤     ├─────┤   │      ┌ ─ ─ ▶ [A, C, V, W]       
      │   C    │     │  0  │                                     
      ├────────┤     ├─────┤   │─ ─ ─ ┘                          
      │   V    │     │  0  │                                     
      ├────────┤     ├─────┤   │                                 
      │   W    │     │  0  │───      ┌ ─ ─ ─ ▶ [B]               
      ├────────┤     ├─────┤                                     
      │   B    │     │  1  │ ─ ─ ─ ─ ┘                           
      ├────────┤     ├─────┤                                     
      │   D    │     │  2  │───      ┌ ─ ─ ▶  [D, Y]             
      ├────────┤     ├─────┤   │─ ─ ─                            
      │   Y    │     │  2  │───                                  
      ├────────┤     ├─────┤          ─ ─ ─▶  [E, X]             
      │   E    │     │  3  │───      │                           
      ├────────┤     ├─────┤   │─ ─ ─                            
      │   X    │     │  3  │───                                  
      ├────────┤     ├─────┤         ─ ─ ─ ─ ▶ [Z]               
      │   Z    │     │  4  │─ ─ ─ ─ ┘                            
      └────────┘     └─────┘                                     
                                   Output: use the               
    Output Values    group       interleave kernel to            
        array       indexes       reorder the input              
                                 arrays so all values            
                                 from the same group             
                                 index are contiguous            
      New ArrayRef    group          and form the                
                     indices        corresponding                
                                   ListArray output              
                                                                 
                                                                 
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to