[I] Improve performance of high cardinality grouping by reusing hash values [datafusion]

via GitHub Fri, 26 Jul 2024 13:16:06 -0700


alamb opened a new issue, #11680:
URL: https://github.com/apache/datafusion/issues/11680


   ### Is your feature request related to a problem or challenge?
   
   As described on https://github.com/apache/datafusion/issues/11679, we can do 
better for high cardinality aggregates
   
   One thing that consumes significant time in such queries is hashing, and I 
think we can reduce that significantly.
   
   Specifically, for the multi-phase repartition plan, the number of hashed 
rows is something like 
   
   ```
   (input cardinality)  + 2 * (intermediate group cardinality) * (number of 
partitions)
   ```
   
   For low cardinality aggregates (e.g when the intermediate group cardinality 
is 1000) the second term is small (a few thousand extra hashes isn't a big deal)
   
   However, for high cardinality aggregates (eg. when the intermediate 
cardinality is like 1,000,000 and there are 16 partitions) the second term is 
substantial
   
   In pictures, this looks like
   
   
   ```
                  ▲                          ▲
                  │                          │
                  │                          │
                  │                          │
                  │                          │
                  │                          │
      ┌───────────────────────┐  ┌───────────────────────┐       4. The  
AggregateMode::Final
      │GroupBy                │  │GroupBy                │       GroupBy 
computes hash(group keys)
      │(AggregateMode::Final) │  │(AggregateMode::Final) │       *AGAIN* to 
find the correct hash
      │                       │  │                       │       bucket
      └───────────────────────┘  └───────────────────────┘
                  ▲                          ▲
                  │                          │
                  └─────────────┬────────────┘
                                │
                                │
                                │
                   ┌─────────────────────────┐                   3. The output 
of the first phase
                   │       Repartition       │                   is 
repartitioned by computing
                   │         HASH(x)         │                   hash(group 
keys) -- this is the
                   └─────────────────────────┘                   same hash as 
computed in step 2.
                                ▲
                                │
                ┌───────────────┴─────────────┐
                │                             │
                │                             │
   ┌─────────────────────────┐  ┌──────────────────────────┐     2. Each 
AggregateMode::Partial
   │        GroubyBy         │  │         GroubyBy         │     GroupBy hashes 
the group keys to
   │(AggregateMode::Partial) │  │ (AggregateMode::Partial) │     find the 
correct hash bucket.
   └─────────────────────────┘  └──────────────────────────┘
                ▲                             ▲
                │                            ┌┘
                │                            │
           .─────────.                  .─────────.
        ,─'           '─.            ,─'           '─.
       ;      Input      :          ;      Input      :          1. Input is 
read
       :   Partition 0   ;          :   Partition 1   ;
        ╲               ╱            ╲               ╱
         '─.         ,─'              '─.         ,─'
            `───────'                    `───────'
   ```
   
   
   
   This effect can be seen in profiling for click bench XXX:
   
   (TODO screen shot showing large amounts of work done in hashing)
   
   
   
   
   
   ### Describe the solution you'd like
   
   The basic idea is rather than rather than recompute the hash values in 
`RepartitionExec` and `AggregateMode::Final` we would reuse the values from 
`AggregateMode::Partial` (which has already computed a hash value for each 
input group)
   
   Something like this
   ```text
                            ▲                          ▲                        
                               
                            │                          │                        
                               
                            │                          │                        
                               
                            │                          │                        
                               
                            │                          │                        
                               
                            │                          │                        
                               
                ┌───────────────────────┐  ┌───────────────────────┐       4. 
The  AggregateMode::Final        
                │GroupBy                │  │GroupBy                │       
GroupBy also gets the hash values   
                │(AggregateMode::Final) │  │(AggregateMode::Final) │       and 
does not recompute them         
                │                       │  │                       │            
                               
                └───────────────────────┘  └───────────────────────┘            
                               
                  ▲         ▲                          ▲                        
                               
                  │         │                          │                        
                               
                            └─────────────┬────────────┘                        
                               
   Pass hash      │                       │                                     
                               
   values up the                          │                                     
                               
   plan rather    │                       │                                     
                               
   than                      ┌─────────────────────────┐                   3. 
In addition to the partial       
   recomputing    │          │       Repartition       │                   
aggregates and group values, *ALSO* 
   them                      │    PRECOMPUTED_HASH     │                   pass 
the hash values to the         
                  │          └─────────────────────────┘                   
RepartitionExec which also passed   
                                          ▲                                them 
on to the AggregateMode::Final 
                  │                       │                                     
                               
                          ┌───────────────┴─────────────┐                       
                               
                  │       │                             │                       
                               
                          │                             │                       
                               
             ┌─────────────────────────┐  ┌──────────────────────────┐     2. 
Each AggregateMode::Partial      
             │        GroubyBy         │  │         GroubyBy         │     
GroupBy hashes the group keys to    
             │(AggregateMode::Partial) │  │ (AggregateMode::Partial) │     find 
the correct hash bucket.       
             └─────────────────────────┘  └──────────────────────────┘          
                               
                          ▲                             ▲                       
                               
                          │                            ┌┘                       
                               
                          │                            │                        
                               
                     .─────────.                  .─────────.                   
                               
                  ,─'           '─.            ,─'           '─.                
                               
                 ;      Input      :          ;      Input      :          1. 
Input is read                    
                 :   Partition 0   ;          :   Partition 1   ;               
                               
                  ╲               ╱            ╲               ╱                
                               
                   '─.         ,─'              '─.         ,─'                 
                               
                      `───────'                    `───────'                    
                               
   ```
   
   ### Describe alternatives you've considered
   
   We maybe could pass the data as an explicit new column somehow, or maybe as 
a field in a struct array 🤔 
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Improve performance of high cardinality grouping by reusing hash values [datafusion]

Reply via email to