alamb commented on issue #6906:
URL: https://github.com/apache/datafusion/issues/6906#issuecomment-2355604402

   ## Background
   (I will make a PR shortly to add this to the actual datafusion docs)
   
   
[`GroupsAccumulator`](https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.GroupsAccumulator.html)
 logically does this:
   
   ```
         ┌─────┐                            
         │  0  │───────────▶   "A"          
         ├─────┤                            
         │  1  │───────────▶   "Z"          
         └─────┘                            
           ...                 ...          
         ┌─────┐                            
         │ N-2 │               "A"          
         ├─────┤                            
         │ N-1 │───────────▶   "Q"          
         └─────┘                            
                                            
                                            
       Logical group      Current Min/Max   
          number          value for that    
                          group             
                                            
                                            
                                            
   GroupsAccumulator to store N aggregate   
   values: logically keepa a mapping from   
   each group index to the current value                                        
   ```
   
   Today, String / Binary min/max values are implemented using 
[`GroupsAccumulatorAdapter`](https://docs.rs/datafusion/latest/datafusion/physical_expr/struct.GroupsAccumulatorAdapter.html)
 which results in 
   
   ```
                                                                 Individual 
String
                                                                 (separate      
  
                                                                 allocation)    
  
                                                                                
  
      ┌─────┐            ┌──────────────────────────┐                           
  
      │  0  │───────────▶│  ScalarValue::Utf8("A")  ├──────────▶   "A"          
  
      ├─────┤            ├──────────────────────────┤                           
  
      │  1  │───────────▶│  ScalarValue::Utf8("Z")  │──────────▶   "Z"          
  
      └─────┘            └──────────────────────────┘                           
  
        ...                 ...                                    ...          
  
      ┌─────┐            ┌──────────────────────────┐                           
  
      │ N-2 │            │  ScalarValue::Utf8("A")  │──────────▶   "A"          
  
      ├─────┤            ├──────────────────────────┤                           
  
      │ N-1 │───────────▶│  ScalarValue::Utf8("Q")  │──────────▶   "Q"          
  
      └─────┘            └──────────────────────────┘                           
  
                                                                                
  
                                                                                
  
    Logical group         Current Min/Max value for that group stored           
  
       number             as a ScalarValue which points to an                   
  
                          indivdually allocated String                          
  
                                                                                
  
                                                                                
  
                                                                                
  
      How GroupsAccumulatorAdaptor works today:                                 
  
      stores each current min/max as a                                          
  
      ScalarValue                                                               
  
                                                                                
  
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to