gianm opened a new pull request, #15661:
URL: https://github.com/apache/druid/pull/15661

   Two speedups for FrameChannelMerger (which does k-way merging in MSQ):
   
   1) Replace the priority queue with a tournament tree, which does fewer 
comparisons.
   
   2) Compare keys using 8-byte strides, rather than 1 byte at a time.
   
   Benchmarks below. In these benchmarks:
   
   - keys can be `random` (20 random characters) or `sequential` (incrementing 
numbers stored as zero-padded strings, like `'00000000000000000001'`, 
`'00000000000000000002'`, etc).
   - rows can be distributed `round_robin` across channels (first key to first 
channel, second key to second channel, etc) or `clustered` (first N keys to 
first channel, next N keys to second channel, etc).
   
   Findings:
   
   - The tournament tree and priority queue are about the same speed when there 
are two channels, but for 16 channels, the tournament tree is ~30% faster. 
Presumably this is due to doing fewer comparisons.
   - 8-byte strides give a ~5% speedup over tournament trees alone when there 
are two channels, but for 16 channels, they give a bigger ~15% speedup. 
Presumably this is due to 16 channels requiring more comparisons, so there's 
more opportunities for the longer stride to make a difference.
   - Overall, combining tournament trees and 8-byte strides gives a ~5% speedup 
for two channels, and ~40% speedup for 16 channels. 
   
   ```
   master (priority queue, 1 byte stride)
   
   Benchmark                                  (channelDistributionString)  
(keyGeneratorString)  (keyLength)  (numChannels)  (numRows)  (rowLength)  Mode  
Cnt     Score    Error  Units
   FrameChannelMergerBenchmark.mergeChannels                  round_robin       
         random           20              2    5000000          100  avgt    5  
 662.988 ± 39.567  ms/op
   FrameChannelMergerBenchmark.mergeChannels                  round_robin       
         random           20             16    5000000          100  avgt    5  
1513.897 ± 45.425  ms/op
   FrameChannelMergerBenchmark.mergeChannels                  round_robin       
     sequential           20              2    5000000          100  avgt    5  
 764.784 ± 29.506  ms/op
   FrameChannelMergerBenchmark.mergeChannels                  round_robin       
     sequential           20             16    5000000          100  avgt    5  
2209.931 ± 31.880  ms/op
   FrameChannelMergerBenchmark.mergeChannels                    clustered       
         random           20              2    5000000          100  avgt    5  
 677.862 ± 28.828  ms/op
   FrameChannelMergerBenchmark.mergeChannels                    clustered       
         random           20             16    5000000          100  avgt    5  
1530.103 ± 80.132  ms/op
   FrameChannelMergerBenchmark.mergeChannels                    clustered       
     sequential           20              2    5000000          100  avgt    5  
 664.240 ± 13.360  ms/op
   FrameChannelMergerBenchmark.mergeChannels                    clustered       
     sequential           20             16    5000000          100  avgt    5  
1424.895 ± 24.168  ms/op
   
   tournament tree, 1 byte stride
   
   Benchmark                                  (channelDistributionString)  
(keyGeneratorString)  (keyLength)  (numChannels)  (numRows)  (rowLength)  Mode  
Cnt     Score    Error  Units
   FrameChannelMergerBenchmark.mergeChannels                  round_robin       
         random           20              2    5000000          100  avgt    5  
 652.683 ± 21.292  ms/op
   FrameChannelMergerBenchmark.mergeChannels                  round_robin       
         random           20             16    5000000          100  avgt    5  
1123.421 ± 33.857  ms/op
   FrameChannelMergerBenchmark.mergeChannels                  round_robin       
     sequential           20              2    5000000          100  avgt    5  
 766.496 ± 23.514  ms/op
   FrameChannelMergerBenchmark.mergeChannels                  round_robin       
     sequential           20             16    5000000          100  avgt    5  
1428.470 ± 38.184  ms/op
   FrameChannelMergerBenchmark.mergeChannels                    clustered       
         random           20              2    5000000          100  avgt    5  
 682.236 ±  9.466  ms/op
   FrameChannelMergerBenchmark.mergeChannels                    clustered       
         random           20             16    5000000          100  avgt    5  
1126.613 ± 25.387  ms/op
   FrameChannelMergerBenchmark.mergeChannels                    clustered       
     sequential           20              2    5000000          100  avgt    5  
 659.716 ± 17.220  ms/op
   FrameChannelMergerBenchmark.mergeChannels                    clustered       
     sequential           20             16    5000000          100  avgt    5  
 897.766 ± 54.850  ms/op
   
   tournament tree, 8-byte stride
   
   Benchmark                                  (channelDistributionString)  
(keyGeneratorString)  (keyLength)  (numChannels)  (numRows)  (rowLength)  Mode  
Cnt     Score    Error  Units
   FrameChannelMergerBenchmark.mergeChannels                  round_robin       
         random           20              2    5000000          100  avgt    5  
 602.409 ±  0.754  ms/op
   FrameChannelMergerBenchmark.mergeChannels                  round_robin       
         random           20             16    5000000          100  avgt    5  
 943.311 ±  4.090  ms/op
   FrameChannelMergerBenchmark.mergeChannels                  round_robin       
     sequential           20              2    5000000          100  avgt    5  
 742.581 ±  3.323  ms/op
   FrameChannelMergerBenchmark.mergeChannels                  round_robin       
     sequential           20             16    5000000          100  avgt    5  
1295.109 ±  3.081  ms/op
   FrameChannelMergerBenchmark.mergeChannels                    clustered       
         random           20              2    5000000          100  avgt    5  
 599.162 ±  1.119  ms/op
   FrameChannelMergerBenchmark.mergeChannels                    clustered       
         random           20             16    5000000          100  avgt    5  
 957.285 ±  1.253  ms/op
   FrameChannelMergerBenchmark.mergeChannels                    clustered       
     sequential           20              2    5000000          100  avgt    5  
 648.640 ±  6.003  ms/op
   FrameChannelMergerBenchmark.mergeChannels                    clustered       
     sequential           20             16    5000000          100  avgt    5  
 761.045 ± 25.297  ms/op
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to