alamb commented on issue #1708:
URL: 
https://github.com/apache/arrow-datafusion/issues/1708#issuecomment-1028906123


   💯  with what @Dandandan  and @houqp  said; Thank you for writing this up 
@yjshen ❤️ 
   
   > I am wondering if for certain operations, e.g. hash aggregate, I feel fixed
   size input the data is stored better in a columnar format (mutable array,
   with offsets),
   
   I agree with @Dandandan  that for HashAggregate this would be super helpful 
-- as the group keys and aggregates could be computed "in place" (so output was 
free)
   
   Sorting is indeed different because the sort key is different than what 
appears in the output. For example `SELECT a, b, c ... ORDER by a+b` needs to 
compare on `a+b`, but still produce tuples of `(a, b, c)`;
   
   The grouping values are produced. For example `SELECT a+b, sum(c) .. GROUP 
BY a+b` produces tuples of `(a+b, sum)`
   
   
   p.s. for what it is worth I think DuckDB has a short string optimization so 
the key may look something more like
   
   
   ```text
   Table A (bool a, char b, int c, string d) row_value (true, 'W', 59, "XYZ")   
             
                                                                                
             
                                                                                
             
                                                                                
             
          
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐    
   
          │ 0F │ 1  │ W  │ 00 │ 00 │ 00 │ 3B │ 03 │ 00 │ 00 │ 00 │ 00 │ X  │ Y  
│ Z  │       
          
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘    
   
                                                                                
             
                                                  8                             
             
                                                                                
             
                                                                                
             
                                                                                
             
   Table A (bool a, char b, int c, string d) row_value (true, 'W', 59, 
"XYZXYZXYZ")          
                                                                                
             
          
┌────┬────┬────┬────┬────┬────┬────┬─────────────────────────────────────────────┐
 
          │ 0F │ 1  │ W  │ 00 │ 00 │ 00 │ 3B │                     PTR          
           │ 
          
└────┴────┴────┴────┴────┴────┴────┴─────────────────────────────────────────────┘
 
                                                                    │           
             
                                                  8                 └───┐       
             
                                                                        ▼       
             
                                                                                
             
                                                                   "XYZXYZXYZ"  
             
                                                                                
             
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to