richardstartin opened a new pull request #8195:
URL: https://github.com/apache/pinot/pull/8195


   This change is motivated by slow queries at one of our customers which group 
by a raw column, where 30GB was seen to be allocated by 
`NoDictionaryMultiColumnGroupKeyGenerator.generateKeyForBlock`, which is also 
where most of the method samples were taken:
   <img width="1590" alt="Screenshot 2022-02-11 at 18 33 25" 
src="https://user-images.githubusercontent.com/16439049/153649438-1d8a054a-5bc1-4313-ac08-6faf6b5a41e1.png";>
   <img width="1602" alt="Screenshot 2022-02-11 at 18 35 21" 
src="https://user-images.githubusercontent.com/16439049/153649797-f73bd1aa-233f-4e57-8007-ff38e417e14b.png";>
   
   This PR starts by generalising one of our pre-existing benchmarks which does 
a good job of exercising the entire query execution. It is parameterised so 
different queries can be added easily, and the generated data is parameterised 
too so that columns with different cardinalities can be created.
   
   Then, the actual improvement is made in the second commit. It transposes the 
group key generation since the `BlockValSet`s will be cached by 
`DataBlockCache` anyway, then accumulates keys into a flyweight, which only 
needs to be allocated to memoize the group key on its first occurrence. This 
roughly halves average time and reduces allocation by at least a factor of 4:
   
   ```
   Benchmark                                                (_numRows)          
                                                                                
                                                                                
                                                                               
(_query)  (_scenario)  Mode  Cnt          Score           Error   Units
   BenchmarkQueries.query                                      1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.001)  avgt    5        200.573 ±        36.577   
ms/op
   BenchmarkQueries.query:·gc.alloc.rate                       1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.001)  avgt    5        454.459 ±       985.590  
MB/sec
   BenchmarkQueries.query:·gc.alloc.rate.norm                  1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.001)  avgt    5  139180218.880 ± 299249329.376    
B/op
   BenchmarkQueries.query:·gc.churn.G1_Eden_Space              1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.001)  avgt    5        510.589 ±       414.979  
MB/sec
   BenchmarkQueries.query:·gc.churn.G1_Eden_Space.norm         1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.001)  avgt    5  156957846.187 ± 109973654.666    
B/op
   BenchmarkQueries.query:·gc.churn.G1_Old_Gen                 1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.001)  avgt    5         12.236 ±        42.494  
MB/sec
   BenchmarkQueries.query:·gc.churn.G1_Old_Gen.norm            1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.001)  avgt    5    3732222.293 ±  12807297.981    
B/op
   BenchmarkQueries.query:·gc.churn.G1_Survivor_Space          1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.001)  avgt    5          4.412 ±        19.484  
MB/sec
   BenchmarkQueries.query:·gc.churn.G1_Survivor_Space.norm     1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.001)  avgt    5    1398101.333 ±   6240670.451    
B/op
   BenchmarkQueries.query:·gc.count                            1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.001)  avgt    5          8.000                  
counts
   BenchmarkQueries.query:·gc.time                             1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.001)  avgt    5        407.000                      
ms
   BenchmarkQueries.query                                      1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL     EXP(0.5)  avgt    5         98.663 ±         7.845   
ms/op
   BenchmarkQueries.query:·gc.alloc.rate                       1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL     EXP(0.5)  avgt    5        696.114 ±      1498.561  
MB/sec
   BenchmarkQueries.query:·gc.alloc.rate.norm                  1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL     EXP(0.5)  avgt    5  106449429.440 ± 228808237.226    
B/op
   BenchmarkQueries.query:·gc.churn.G1_Eden_Space              1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL     EXP(0.5)  avgt    5       1029.174 ±      2217.348  
MB/sec
   BenchmarkQueries.query:·gc.churn.G1_Eden_Space.norm         1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL     EXP(0.5)  avgt    5  157963208.145 ± 341415795.326    
B/op
   BenchmarkQueries.query:·gc.churn.G1_Old_Gen                 1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL     EXP(0.5)  avgt    5          0.109 ±         0.714  
MB/sec
   BenchmarkQueries.query:·gc.churn.G1_Old_Gen.norm            1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL     EXP(0.5)  avgt    5      16481.891 ±    107743.719    
B/op
   BenchmarkQueries.query:·gc.count                            1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL     EXP(0.5)  avgt    5          4.000                  
counts
   BenchmarkQueries.query:·gc.time                             1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL     EXP(0.5)  avgt    5         10.000                      
ms
   BenchmarkQueries.query                                      1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.999)  avgt    5         90.816 ±         8.115   
ms/op
   BenchmarkQueries.query:·gc.alloc.rate                       1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.999)  avgt    5        752.671 ±      1621.248  
MB/sec
   BenchmarkQueries.query:·gc.alloc.rate.norm                  1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.999)  avgt    5  106393285.309 ± 228688298.313    
B/op
   BenchmarkQueries.query:·gc.churn.G1_Eden_Space              1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.999)  avgt    5       1022.984 ±      2203.794  
MB/sec
   BenchmarkQueries.query:·gc.churn.G1_Eden_Space.norm         1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.999)  avgt    5  143076606.448 ± 309117349.271    
B/op
   BenchmarkQueries.query:·gc.churn.G1_Old_Gen                 1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.999)  avgt    5          0.150 ±         0.832  
MB/sec
   BenchmarkQueries.query:·gc.churn.G1_Old_Gen.norm            1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.999)  avgt    5      21384.158 ±    119911.555    
B/op
   BenchmarkQueries.query:·gc.churn.G1_Survivor_Space          1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.999)  avgt    5          0.126 ±         1.082  
MB/sec
   BenchmarkQueries.query:·gc.churn.G1_Survivor_Space.norm     1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.999)  avgt    5      17476.267 ±    150475.927    
B/op
   BenchmarkQueries.query:·gc.count                            1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.999)  avgt    5          4.000                  
counts
   BenchmarkQueries.query:·gc.time                             1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.999)  avgt    5         52.000                      
ms
   ```
   
   ```
   Benchmark                                                (_numRows)          
                                                                                
                                                                                
                                                                               
(_query)  (_scenario)  Mode  Cnt         Score          Error   Units
   BenchmarkQueries.query                                      1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.001)  avgt    5       130.071 ±        5.744   ms/op
   BenchmarkQueries.query:·gc.alloc.rate                       1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.001)  avgt    5       197.775 ±      424.170  MB/sec
   BenchmarkQueries.query:·gc.alloc.rate.norm                  1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.001)  avgt    5  39989639.600 ± 85754314.959    B/op
   BenchmarkQueries.query:·gc.churn.G1_Eden_Space              1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.001)  avgt    5       210.675 ±      161.001  MB/sec
   BenchmarkQueries.query:·gc.churn.G1_Eden_Space.norm         1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.001)  avgt    5  42677043.200 ± 33405655.687    B/op
   BenchmarkQueries.query:·gc.churn.G1_Old_Gen                 1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.001)  avgt    5         0.835 ±        6.634  MB/sec
   BenchmarkQueries.query:·gc.churn.G1_Old_Gen.norm            1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.001)  avgt    5    167488.000 ±  1331249.740    B/op
   BenchmarkQueries.query:·gc.count                            1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.001)  avgt    5        11.000                 counts
   BenchmarkQueries.query:·gc.time                             1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.001)  avgt    5       268.000                     ms
   BenchmarkQueries.query                                      1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL     EXP(0.5)  avgt    5        54.864 ±        4.432   ms/op
   BenchmarkQueries.query:·gc.alloc.rate                       1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL     EXP(0.5)  avgt    5        10.390 ±       18.671  MB/sec
   BenchmarkQueries.query:·gc.alloc.rate.norm                  1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL     EXP(0.5)  avgt    5    883504.473 ±  1574108.609    B/op
   BenchmarkQueries.query:·gc.count                            1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL     EXP(0.5)  avgt    5           ≈ 0                 counts
   BenchmarkQueries.query                                      1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.999)  avgt    5        47.429 ±        2.367   ms/op
   BenchmarkQueries.query:·gc.alloc.rate                       1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.999)  avgt    5        10.961 ±       19.191  MB/sec
   BenchmarkQueries.query:·gc.alloc.rate.norm                  1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.999)  avgt    5    811307.408 ±  1420111.422    B/op
   BenchmarkQueries.query:·gc.count                            1500000          
                                                                                
                                                                                
          SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY 
RAW_INT_COL,INT_COL   EXP(0.999)  avgt    5           ≈ 0                 counts
   ```
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to