gianm opened a new pull request, #15661:
URL: https://github.com/apache/druid/pull/15661
Two speedups for FrameChannelMerger (which does k-way merging in MSQ):
1) Replace the priority queue with a tournament tree, which does fewer
comparisons.
2) Compare keys using 8-byte strides, rather than 1 byte at a time.
Benchmarks below. In these benchmarks:
- keys can be `random` (20 random characters) or `sequential` (incrementing
numbers stored as zero-padded strings, like `'00000000000000000001'`,
`'00000000000000000002'`, etc).
- rows can be distributed `round_robin` across channels (first key to first
channel, second key to second channel, etc) or `clustered` (first N keys to
first channel, next N keys to second channel, etc).
Findings:
- The tournament tree and priority queue are about the same speed when there
are two channels, but for 16 channels, the tournament tree is ~30% faster.
Presumably this is due to doing fewer comparisons.
- 8-byte strides give a ~5% speedup over tournament trees alone when there
are two channels, but for 16 channels, they give a bigger ~15% speedup.
Presumably this is due to 16 channels requiring more comparisons, so there's
more opportunities for the longer stride to make a difference.
- Overall, combining tournament trees and 8-byte strides gives a ~5% speedup
for two channels, and ~40% speedup for 16 channels.
```
master (priority queue, 1 byte stride)
Benchmark (channelDistributionString)
(keyGeneratorString) (keyLength) (numChannels) (numRows) (rowLength) Mode
Cnt Score Error Units
FrameChannelMergerBenchmark.mergeChannels round_robin
random 20 2 5000000 100 avgt 5
662.988 ± 39.567 ms/op
FrameChannelMergerBenchmark.mergeChannels round_robin
random 20 16 5000000 100 avgt 5
1513.897 ± 45.425 ms/op
FrameChannelMergerBenchmark.mergeChannels round_robin
sequential 20 2 5000000 100 avgt 5
764.784 ± 29.506 ms/op
FrameChannelMergerBenchmark.mergeChannels round_robin
sequential 20 16 5000000 100 avgt 5
2209.931 ± 31.880 ms/op
FrameChannelMergerBenchmark.mergeChannels clustered
random 20 2 5000000 100 avgt 5
677.862 ± 28.828 ms/op
FrameChannelMergerBenchmark.mergeChannels clustered
random 20 16 5000000 100 avgt 5
1530.103 ± 80.132 ms/op
FrameChannelMergerBenchmark.mergeChannels clustered
sequential 20 2 5000000 100 avgt 5
664.240 ± 13.360 ms/op
FrameChannelMergerBenchmark.mergeChannels clustered
sequential 20 16 5000000 100 avgt 5
1424.895 ± 24.168 ms/op
tournament tree, 1 byte stride
Benchmark (channelDistributionString)
(keyGeneratorString) (keyLength) (numChannels) (numRows) (rowLength) Mode
Cnt Score Error Units
FrameChannelMergerBenchmark.mergeChannels round_robin
random 20 2 5000000 100 avgt 5
652.683 ± 21.292 ms/op
FrameChannelMergerBenchmark.mergeChannels round_robin
random 20 16 5000000 100 avgt 5
1123.421 ± 33.857 ms/op
FrameChannelMergerBenchmark.mergeChannels round_robin
sequential 20 2 5000000 100 avgt 5
766.496 ± 23.514 ms/op
FrameChannelMergerBenchmark.mergeChannels round_robin
sequential 20 16 5000000 100 avgt 5
1428.470 ± 38.184 ms/op
FrameChannelMergerBenchmark.mergeChannels clustered
random 20 2 5000000 100 avgt 5
682.236 ± 9.466 ms/op
FrameChannelMergerBenchmark.mergeChannels clustered
random 20 16 5000000 100 avgt 5
1126.613 ± 25.387 ms/op
FrameChannelMergerBenchmark.mergeChannels clustered
sequential 20 2 5000000 100 avgt 5
659.716 ± 17.220 ms/op
FrameChannelMergerBenchmark.mergeChannels clustered
sequential 20 16 5000000 100 avgt 5
897.766 ± 54.850 ms/op
tournament tree, 8-byte stride
Benchmark (channelDistributionString)
(keyGeneratorString) (keyLength) (numChannels) (numRows) (rowLength) Mode
Cnt Score Error Units
FrameChannelMergerBenchmark.mergeChannels round_robin
random 20 2 5000000 100 avgt 5
602.409 ± 0.754 ms/op
FrameChannelMergerBenchmark.mergeChannels round_robin
random 20 16 5000000 100 avgt 5
943.311 ± 4.090 ms/op
FrameChannelMergerBenchmark.mergeChannels round_robin
sequential 20 2 5000000 100 avgt 5
742.581 ± 3.323 ms/op
FrameChannelMergerBenchmark.mergeChannels round_robin
sequential 20 16 5000000 100 avgt 5
1295.109 ± 3.081 ms/op
FrameChannelMergerBenchmark.mergeChannels clustered
random 20 2 5000000 100 avgt 5
599.162 ± 1.119 ms/op
FrameChannelMergerBenchmark.mergeChannels clustered
random 20 16 5000000 100 avgt 5
957.285 ± 1.253 ms/op
FrameChannelMergerBenchmark.mergeChannels clustered
sequential 20 2 5000000 100 avgt 5
648.640 ± 6.003 ms/op
FrameChannelMergerBenchmark.mergeChannels clustered
sequential 20 16 5000000 100 avgt 5
761.045 ± 25.297 ms/op
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]