clintropolis opened a new pull request #11004:
URL: https://github.com/apache/druid/pull/11004
### Description
This PR specializes `LongDeserializer` implementations with `getDelta` and
`getTable` methods to push down unpacking bits, delta encoding adjustment, and
table lookups for vectors of data as far as possible and work more efficiently
with vectorized query engines, primarily focusing on contiguous reads.
It works by unrolling value reads to line up with `ByteBuffer` get methods
to eliminate overlapping reads where possible, reading blocks of 8 values at a
time for un-aligned values (1, 2, 4, 12, 20, 24, 40, 48, 56). `ByteBuffer`
aligned bit-packing widths (8, 16, 32, 64) actually performed worse when this
same unrolling was performed, so instead they utilize a traditional for loop
with the aligned get methods.
### Full column scans before/after on uniform distribution columns of
varying bits to cover the entire set of value decoders
before:
```
Benchmark
(distribution) (encoding) (filteredRowCountPercentage) (rows)
(zeroProbability) Mode Cnt Score Error Units
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-1 lz4-auto 1.0 5000000
0.0 avgt 5 23862.994 ± 1944.514 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-2 lz4-auto 1.0 5000000
0.0 avgt 5 23567.778 ± 757.632 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-3 lz4-auto 1.0 5000000
0.0 avgt 5 29546.975 ± 1916.436 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-4 lz4-auto 1.0 5000000
0.0 avgt 5 28826.685 ± 3260.379 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-8 lz4-auto 1.0 5000000
0.0 avgt 5 19829.688 ± 923.047 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-12 lz4-auto 1.0 5000000
0.0 avgt 5 32419.941 ± 1486.951 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-16 lz4-auto 1.0 5000000
0.0 avgt 5 22460.206 ± 4207.942 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-20 lz4-auto 1.0 5000000
0.0 avgt 5 31415.040 ± 3867.699 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-24 lz4-auto 1.0 5000000
0.0 avgt 5 24444.519 ± 2472.537 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uinform-32 lz4-auto 1.0 5000000
0.0 avgt 5 18031.908 ± 1471.220 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-40 lz4-auto 1.0 5000000
0.0 avgt 5 22442.611 ± 1778.178 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-48 lz4-auto 1.0 5000000
0.0 avgt 5 24733.015 ± 3076.896 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-56 lz4-auto 1.0 5000000
0.0 avgt 5 23201.264 ± 1578.009 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-64 lz4-auto 1.0 5000000
0.0 avgt 5 20581.399 ± 1269.190 us/op
```
after:
```
Benchmark
(distribution) (encoding) (filteredRowCountPercentage) (rows)
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-1 lz4-auto 1.0 5000000
0.0 avgt 5 17892.362 ± 102.776 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-2 lz4-auto 1.0 5000000
0.0 avgt 5 17796.103 ± 417.847 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-3 lz4-auto 1.0 5000000
0.0 avgt 5 17888.816 ± 631.676 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-4 lz4-auto 1.0 5000000
0.0 avgt 5 17879.496 ± 237.066 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-8 lz4-auto 1.0 5000000
0.0 avgt 5 17508.260 ± 560.856 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-12 lz4-auto 1.0 5000000
0.0 avgt 5 18272.440 ± 71.751 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-16 lz4-auto 1.0 5000000
0.0 avgt 5 19042.292 ± 595.685 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-20 lz4-auto 1.0 5000000
0.0 avgt 5 18782.746 ± 248.738 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-24 lz4-auto 1.0 5000000
0.0 avgt 5 19048.354 ± 160.025 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uinform-32 lz4-auto 1.0 5000000
0.0 avgt 5 17984.778 ± 633.691 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-40 lz4-auto 1.0 5000000
0.0 avgt 5 22070.007 ± 166.035 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-48 lz4-auto 1.0 5000000
0.0 avgt 5 22052.517 ± 2168.763 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-56 lz4-auto 1.0 5000000
0.0 avgt 5 25001.739 ± 259.184 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
uniform-64 lz4-auto 1.0 5000000
0.0 avgt 5 20325.369 ± 60.724 us/op
```
### Full column scans on other value distribution before/after comparison
before:
```
Benchmark
(distribution) (encoding) (filteredRowCountPercentage) (rows)
(zeroProbability) Mode Cnt Score Error Units
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
enumerated-0-1 lz4-auto 1.0 5000000
0.0 avgt 5 24186.788 ± 1440.692 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
enumerated-full lz4-auto 1.0 5000000
0.0 avgt 5 27220.865 ± 2221.740 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
normal-1-32 lz4-auto 1.0 5000000
0.0 avgt 5 19490.250 ± 1231.429 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
normal-40-1000 lz4-auto 1.0 5000000
0.0 avgt 5 20701.792 ± 1898.845 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
sequential-1000 lz4-auto 1.0 5000000
0.0 avgt 5 28977.482 ± 878.240 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
sequential-unique lz4-auto 1.0 5000000
0.0 avgt 5 22856.419 ± 247.786 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
zipf-low-100 lz4-auto 1.0 5000000
0.0 avgt 5 19135.244 ± 437.260 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
zipf-low-100000 lz4-auto 1.0 5000000
0.0 avgt 5 31696.149 ± 2933.263 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
zipf-low-32-bit lz4-auto 1.0 5000000
0.0 avgt 5 25815.759 ± 2073.480 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
zipf-high-100 lz4-auto 1.0 5000000
0.0 avgt 5 20582.544 ± 1726.414 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
zipf-high-100000 lz4-auto 1.0 5000000
0.0 avgt 5 18996.105 ± 1412.307 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
zipf-high-32-bit lz4-auto 1.0 5000000
0.0 avgt 5 18628.002 ± 450.125 us/op
```
after:
```
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
enumerated-0-1 lz4-auto 1.0 5000000
0.0 avgt 5 19250.882 ± 1221.067 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
enumerated-full lz4-auto 1.0 5000000
0.0 avgt 5 19853.160 ± 780.701 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
normal-1-32 lz4-auto 1.0 5000000
0.0 avgt 5 18414.646 ± 1705.926 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
normal-40-1000 lz4-auto 1.0 5000000
0.0 avgt 5 20054.183 ± 2539.912 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
sequential-1000 lz4-auto 1.0 5000000
0.0 avgt 5 18585.123 ± 1524.581 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
sequential-unique lz4-auto 1.0 5000000
0.0 avgt 5 19304.239 ± 808.152 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
zipf-low-100 lz4-auto 1.0 5000000
0.0 avgt 5 19248.088 ± 470.994 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
zipf-low-100000 lz4-auto 1.0 5000000
0.0 avgt 5 23815.318 ± 4413.581 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
zipf-low-32-bit lz4-auto 1.0 5000000
0.0 avgt 5 26111.974 ± 4594.582 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
zipf-high-100 lz4-auto 1.0 5000000
0.0 avgt 5 20144.349 ± 489.978 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
zipf-high-100000 lz4-auto 1.0 5000000
0.0 avgt 5 18956.415 ± 796.136 us/op
ColumnarLongsSelectRowsFromGeneratorBenchmark.selectRowsVectorized
zipf-high-32-bit lz4-auto 1.0 5000000
0.0 avgt 5 18666.583 ± 385.407 us/op
```
<hr>
<hr>
This PR has:
- [ ] been self-reviewed.
- [ ] added documentation for new or modified features or behaviors.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] been tested in a test Druid cluster.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]