alamb commented on PR #4524:
URL: https://github.com/apache/arrow-rs/pull/4524#issuecomment-1638945907

   > For single column case, to embed a variable length column value, like 
String, to the RawTable may not be good.
   
   Just to be clear, what I was imagining for the group storage is not to 
change the contents of the `RawTable` (it will continue to contain 
group_indexes).
   
   But instead of storing group_values using the arrow `Row` format 
   
   ```
    stores "group       stores group values, 
       indexes"          in arrow_row format 
                                             
    ┌─────────────┐      ┌────────────┐      
    │   ┌─────┐   │      │ ┌────────┐ │      
    │   │  5  │   │ ┌────┼▶│  "A"   │ │      
    │   ├─────┤   │ │    │ ├────────┤ │      
    │   │  9  │   │ │    │ │  "Z"   │ │      
    │   └─────┘   │ │    │ └────────┘ │      
    │     ...     │ │    │            │      
    │   ┌─────┐   │ │    │    ...     │      
    │   │  1  │───┼─┘    │            │      
    │   ├─────┤   │      │            │      
    │   │ 13  │───┼─┐    │ ┌────────┐ │      
    │   └─────┘   │ └────┼▶│  "Q"   │ │      
    └─────────────┘      │ └────────┘ │      
                         │            │      
                         └────────────┘      
                                             
                                             
                                             
          map            group_values        
     (Hash Table)                            
                                                                                
                   
                                                                                
  
   ```
   
   We would instead store the group values using a native type like `Vec<T>` 
like this
   
   ```
    stores "group               stored in a      
       indexes"                native Vec<T>     
                                                 
    ┌─────────────┐            ┌──────────┐      
    │   ┌─────┐   │            │ ┌──────┐ │      
    │   │  5  │   │    ┌───────┼▶│  1   │ │      
    │   ├─────┤   │    │       │ ├──────┤ │      
    │   │  9  │   │    │       │ │  3   │ │      
    │   └─────┘   │    │       │ └──────┘ │      
    │     ...     │    │       │          │      
    │   ┌─────┐   │    │       │    ...   │      
    │   │  1  │───┼────┘       │          │      
    │   ├─────┤   │            │          │      
    │   │ 13  │───┼────┐       │ ┌──────┐ │      
    │   └─────┘   │    └───────┼▶│  5   │ │      
    └─────────────┘            │ └──────┘ │      
                               │          │      
                               └──────────┘      
                                                 
                                                 
                               group_values      
          map                                    
     (Hash Table)                                
                                                 
   ```
   
   I agree the null value would need some special handling, but since this 
would only be for single columns (where there can be at most one null value) I 
think we could figure out some way to handle it
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to