clintropolis opened a new pull request, #12846:
URL: https://github.com/apache/druid/pull/12846
### Description
Adds specialized implementations for java long, double, and int value type
implementations of `FixedIndexed`, which is used by the nested data columns
added in #12753.
While not entirely attributable to this PR (the range filtering tests owe
that to #12830), repeating the benchmarks done in show
improvement:
```
SELECT SUM(long1) FROM foo
SELECT SUM(JSON_VALUE(nested, '$.long1' RETURNING BIGINT)) FROM foo
old:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 0 5000000 false
avgt 5 36.711 ± 0.917 ms/op
SqlNestedDataBenchmark.querySql 0 5000000 force
avgt 5 15.587 ± 0.276 ms/op
SqlNestedDataBenchmark.querySql 1 5000000 false
avgt 5 39.224 ± 0.870 ms/op
SqlNestedDataBenchmark.querySql 1 5000000 force
avgt 5 15.877 ± 0.440 ms/op
new:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 0 5000000 false
avgt 5 37.672 ± 1.136 ms/op
SqlNestedDataBenchmark.querySql 0 5000000 force
avgt 5 15.687 ± 0.497 ms/op
SqlNestedDataBenchmark.querySql 1 5000000 false
avgt 5 39.437 ± 0.690 ms/op
SqlNestedDataBenchmark.querySql 1 5000000 force
avgt 5 15.978 ± 0.587 ms/op
SELECT SUM(long1), SUM(long2) FROM foo
SELECT SUM(JSON_VALUE(nested, '$.long1' RETURNING BIGINT)),
SUM(JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT)) FROM foo
old:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 2 5000000 false
avgt 5 63.805 ± 1.036 ms/op
SqlNestedDataBenchmark.querySql 2 5000000 force
avgt 5 30.381 ± 1.201 ms/op
SqlNestedDataBenchmark.querySql 3 5000000 false
avgt 5 66.660 ± 0.806 ms/op
SqlNestedDataBenchmark.querySql 3 5000000 force
avgt 5 30.341 ± 1.124 ms/op
new:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 2 5000000 false
avgt 5 63.825 ± 1.228 ms/op
SqlNestedDataBenchmark.querySql 2 5000000 force
avgt 5 30.955 ± 0.769 ms/op
SqlNestedDataBenchmark.querySql 3 5000000 false
avgt 5 67.277 ± 0.928 ms/op
SqlNestedDataBenchmark.querySql 3 5000000 force
avgt 5 30.860 ± 1.023 ms/op
SELECT SUM(long1), SUM(long2), SUM(double3) FROM foo
SELECT SUM(JSON_VALUE(nested, '$.long1' RETURNING BIGINT)),
SUM(JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT)),
SUM(JSON_VALUE(nested, '$.nesteder.double3' RETURNING DOUBLE)) FROM foo
old:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 4 5000000 false
avgt 5 78.570 ± 1.657 ms/op
SqlNestedDataBenchmark.querySql 4 5000000 force
avgt 5 37.777 ± 1.295 ms/op
SqlNestedDataBenchmark.querySql 5 5000000 false
avgt 5 82.672 ± 1.010 ms/op
SqlNestedDataBenchmark.querySql 5 5000000 force
avgt 5 37.887 ± 0.802 ms/op
new:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 4 5000000 false
avgt 5 80.173 ± 1.342 ms/op
SqlNestedDataBenchmark.querySql 4 5000000 force
avgt 5 38.272 ± 0.589 ms/op
SqlNestedDataBenchmark.querySql 5 5000000 false
avgt 5 84.370 ± 1.275 ms/op
SqlNestedDataBenchmark.querySql 5 5000000 force
avgt 5 38.541 ± 1.137 ms/op
SELECT string1, SUM(long1) FROM foo GROUP BY 1 ORDER BY 2,
SELECT JSON_VALUE(nested, '$.nesteder.string1'), SUM(JSON_VALUE(nested,
'$.long1' RETURNING BIGINT)) FROM foo GROUP BY 1 ORDER BY 2,
old:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 6 5000000 false
avgt 5 269.560 ± 1.454 ms/op
SqlNestedDataBenchmark.querySql 6 5000000 force
avgt 5 157.090 ± 4.058 ms/op
SqlNestedDataBenchmark.querySql 7 5000000 false
avgt 5 373.162 ± 2.871 ms/op
SqlNestedDataBenchmark.querySql 7 5000000 force
avgt 5 195.213 ± 1.993 ms/op
new:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 6 5000000 false
avgt 5 234.328 ± 5.505 ms/op
SqlNestedDataBenchmark.querySql 6 5000000 force
avgt 5 155.711 ± 5.286 ms/op
SqlNestedDataBenchmark.querySql 7 5000000 false
avgt 5 383.741 ± 4.796 ms/op
SqlNestedDataBenchmark.querySql 7 5000000 force
avgt 5 195.051 ± 6.225 ms/op
SELECT string1, SUM(long1), SUM(double3) FROM foo GROUP BY 1 ORDER BY 2
SELECT JSON_VALUE(nested, '$.nesteder.string1'), SUM(JSON_VALUE(nested,
'$.long1' RETURNING BIGINT)), SUM(JSON_VALUE(nested, '$.nesteder.double3'
RETURNING DOUBLE)) FROM foo GROUP BY 1 ORDER BY 2
old:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 8 5000000 false
avgt 5 251.743 ± 6.438 ms/op
SqlNestedDataBenchmark.querySql 8 5000000 force
avgt 5 172.322 ± 14.814 ms/op
SqlNestedDataBenchmark.querySql 9 5000000 false
avgt 5 417.454 ± 21.276 ms/op
SqlNestedDataBenchmark.querySql 9 5000000 force
avgt 5 215.228 ± 9.304 ms/op
new:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 8 5000000 false
avgt 5 302.745 ± 6.738 ms/op
SqlNestedDataBenchmark.querySql 8 5000000 force
avgt 5 168.410 ± 1.750 ms/op
SqlNestedDataBenchmark.querySql 9 5000000 false
avgt 5 459.633 ± 5.099 ms/op
SqlNestedDataBenchmark.querySql 9 5000000 force
avgt 5 208.978 ± 1.130 ms/op
SELECT SUM(long1) FROM foo WHERE string1 = '10000' OR string1 = '1000'
SELECT SUM(JSON_VALUE(nested, '$.long1' RETURNING BIGINT)) FROM foo WHERE
JSON_VALUE(nested, '$.nesteder.string1') = '10000' OR JSON_VALUE(nested,
'$.nesteder.string1') = '1000'
old:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 10 5000000 false
avgt 5 11.482 ± 0.495 ms/op
SqlNestedDataBenchmark.querySql 10 5000000 force
avgt 5 11.549 ± 0.303 ms/op
SqlNestedDataBenchmark.querySql 11 5000000 false
avgt 5 11.695 ± 0.293 ms/op
SqlNestedDataBenchmark.querySql 11 5000000 force
avgt 5 11.931 ± 0.338 ms/op
new:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 10 5000000 false
avgt 5 11.427 ± 0.480 ms/op
SqlNestedDataBenchmark.querySql 10 5000000 force
avgt 5 11.545 ± 0.431 ms/op
SqlNestedDataBenchmark.querySql 11 5000000 false
avgt 5 11.650 ± 0.520 ms/op
SqlNestedDataBenchmark.querySql 11 5000000 force
avgt 5 11.732 ± 0.406 ms/op
SELECT SUM(long1) FROM foo WHERE long2 = 10000 OR long2 = 1000
SELECT SUM(JSON_VALUE(nested, '$.long1' RETURNING BIGINT)) FROM foo WHERE
JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT) = 10000 OR
JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT) = 1000
old:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 12 5000000 false
avgt 5 78.895 ± 2.158 ms/op
SqlNestedDataBenchmark.querySql 12 5000000 force
avgt 5 48.814 ± 0.874 ms/op
SqlNestedDataBenchmark.querySql 13 5000000 false
avgt 5 1.297 ± 0.008 ms/op
SqlNestedDataBenchmark.querySql 13 5000000 force
avgt 5 1.277 ± 0.011 ms/op
new:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 12 5000000 false
avgt 5 77.838 ± 3.391 ms/op
SqlNestedDataBenchmark.querySql 12 5000000 force
avgt 5 50.000 ± 1.199 ms/op
SqlNestedDataBenchmark.querySql 13 5000000 false
avgt 5 1.096 ± 0.017 ms/op
SqlNestedDataBenchmark.querySql 13 5000000 force
avgt 5 1.105 ± 0.023 ms/op
SELECT SUM(long1) FROM foo WHERE double3 < 10000.0 AND double3 > 1000.0
SELECT SUM(JSON_VALUE(nested, '$.long1' RETURNING BIGINT)) FROM foo WHERE
JSON_VALUE(nested, '$.nesteder.double3' RETURNING DOUBLE) < 10000.0 AND
JSON_VALUE(nested, '$.nesteder.double3' RETURNING DOUBLE) > 1000.0
old:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 14 5000000 false
avgt 5 92.982 ± 1.473 ms/op
SqlNestedDataBenchmark.querySql 14 5000000 force
avgt 5 54.729 ± 0.429 ms/op
SqlNestedDataBenchmark.querySql 15 5000000 false
avgt 5 580.472 ± 28.064 ms/op
SqlNestedDataBenchmark.querySql 15 5000000 force
avgt 5 561.494 ± 54.096 ms/op
new:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 14 5000000 false
avgt 5 93.883 ± 4.995 ms/op
SqlNestedDataBenchmark.querySql 14 5000000 force
avgt 5 52.985 ± 1.123 ms/op
SqlNestedDataBenchmark.querySql 15 5000000 false
avgt 5 228.775 ± 3.131 ms/op
SqlNestedDataBenchmark.querySql 15 5000000 force
avgt 5 216.295 ± 2.309 ms/op
SELECT long1, SUM(double3) FROM foo WHERE string1 = '10000' OR string1 =
'1000' GROUP BY 1 ORDER BY 2
SELECT JSON_VALUE(nested, '$.long1' RETURNING BIGINT),
SUM(JSON_VALUE(nested, '$.nesteder.double3' RETURNING DOUBLE)) FROM foo WHERE
JSON_VALUE(nested, '$.nesteder.string1') = '10000' OR JSON_VALUE(nested,
'$.nesteder.string1') = '1000' GROUP BY 1 ORDER BY 2
old:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 16 5000000 false
avgt 5 129.760 ± 9.953 ms/op
SqlNestedDataBenchmark.querySql 16 5000000 force
avgt 5 133.015 ± 20.961 ms/op
SqlNestedDataBenchmark.querySql 17 5000000 false
avgt 5 142.197 ± 8.773 ms/op
SqlNestedDataBenchmark.querySql 17 5000000 force
avgt 5 132.048 ± 15.546 ms/op
new:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 16 5000000 false
avgt 5 125.571 ± 3.131 ms/op
SqlNestedDataBenchmark.querySql 16 5000000 force
avgt 5 125.625 ± 4.528 ms/op
SqlNestedDataBenchmark.querySql 17 5000000 false
avgt 5 125.689 ± 2.543 ms/op
SqlNestedDataBenchmark.querySql 17 5000000 force
avgt 5 126.233 ± 4.543 ms/op
SELECT string1, SUM(double3) FROM foo WHERE long2 < 10000 AND long2 > 1000
GROUP BY 1 ORDER BY 2
SELECT JSON_VALUE(nested, '$.nesteder.string1'), SUM(JSON_VALUE(nested,
'$.nesteder.double3' RETURNING DOUBLE)) FROM foo WHERE JSON_VALUE(nested,
'$.nesteder.long2' RETURNING BIGINT) < 10000 AND JSON_VALUE(nested,
'$.nesteder.long2' RETURNING BIGINT) > 1000 GROUP BY 1 ORDER BY 2
old:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 18 5000000 false
avgt 5 161.445 ± 7.024 ms/op
SqlNestedDataBenchmark.querySql 18 5000000 force
avgt 5 138.212 ± 19.673 ms/op
SqlNestedDataBenchmark.querySql 19 5000000 false
avgt 5 123.486 ± 5.029 ms/op
SqlNestedDataBenchmark.querySql 19 5000000 force
avgt 5 120.079 ± 6.822 ms/op
new:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 18 5000000 false
avgt 5 149.483 ± 4.772 ms/op
SqlNestedDataBenchmark.querySql 18 5000000 force
avgt 5 128.978 ± 4.393 ms/op
SqlNestedDataBenchmark.querySql 19 5000000 false
avgt 5 114.389 ± 4.373 ms/op
SqlNestedDataBenchmark.querySql 19 5000000 force
avgt 5 114.224 ± 3.520 ms/op
SELECT string1, SUM(double3) FROM foo WHERE double3 < 10000.0 AND double3 >
1000.0 GROUP BY 1 ORDER BY 2
SELECT JSON_VALUE(nested, '$.nesteder.string1'), SUM(JSON_VALUE(nested,
'$.nesteder.double3' RETURNING DOUBLE)) FROM foo WHERE JSON_VALUE(nested,
'$.nesteder.double3' RETURNING DOUBLE) < 10000.0 AND JSON_VALUE(nested,
'$.nesteder.double3' RETURNING DOUBLE) > 1000.0 GROUP BY 1 ORDER BY 2
old:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 20 5000000 false
avgt 5 280.393 ± 15.369 ms/op
SqlNestedDataBenchmark.querySql 20 5000000 force
avgt 5 174.545 ± 2.702 ms/op
SqlNestedDataBenchmark.querySql 21 5000000 false
avgt 5 802.647 ± 32.078 ms/op
SqlNestedDataBenchmark.querySql 21 5000000 force
avgt 5 591.274 ± 16.460 ms/op
new:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 20 5000000 false
avgt 5 264.055 ± 4.947 ms/op
SqlNestedDataBenchmark.querySql 20 5000000 force
avgt 5 172.410 ± 3.586 ms/op
SqlNestedDataBenchmark.querySql 21 5000000 false
avgt 5 454.973 ± 5.799 ms/op
SqlNestedDataBenchmark.querySql 21 5000000 force
avgt 5 359.288 ± 5.179 ms/op
SELECT long2 FROM foo WHERE long2 IN (1, 19, 21, 23, 25, 26, 46),
SELECT JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT) FROM foo
WHERE JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT) IN (1, 19, 21,
23, 25, 26, 46),
old:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 22 5000000 false
avgt 5 273.464 ± 15.731 ms/op
SqlNestedDataBenchmark.querySql 22 5000000 force
avgt 5 272.270 ± 20.511 ms/op
SqlNestedDataBenchmark.querySql 23 5000000 false
avgt 5 174.960 ± 1.923 ms/op
SqlNestedDataBenchmark.querySql 23 5000000 force
avgt 5 177.920 ± 4.095 ms/op
new:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 22 5000000 false
avgt 5 283.230 ± 6.222 ms/op
SqlNestedDataBenchmark.querySql 22 5000000 force
avgt 5 282.467 ± 5.709 ms/op
SqlNestedDataBenchmark.querySql 23 5000000 false
avgt 5 177.047 ± 4.990 ms/op
SqlNestedDataBenchmark.querySql 23 5000000 force
avgt 5 172.208 ± 1.031 ms/op
SELECT long2 FROM foo WHERE long2 IN (1, 19, 21, 23, 25, 26, 46) GROUP BY 1",
SELECT JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT) FROM foo
WHERE JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT) IN (1, 19, 21,
23, 25, 26, 46) GROUP BY 1
old:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 24 5000000 false
avgt 5 318.280 ± 7.544 ms/op
SqlNestedDataBenchmark.querySql 24 5000000 force
avgt 5 210.866 ± 14.684 ms/op
SqlNestedDataBenchmark.querySql 25 5000000 false
avgt 5 215.200 ± 2.366 ms/op
SqlNestedDataBenchmark.querySql 25 5000000 force
avgt 5 152.399 ± 22.695 ms/op
new:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 24 5000000 false
avgt 5 308.668 ± 7.274 ms/op
SqlNestedDataBenchmark.querySql 24 5000000 force
avgt 5 199.587 ± 3.162 ms/op
SqlNestedDataBenchmark.querySql 25 5000000 false
avgt 5 212.399 ± 4.406 ms/op
SqlNestedDataBenchmark.querySql 25 5000000 force
avgt 5 149.650 ± 3.489 ms/op
SELECT SUM(long1) FROM foo WHERE double3 < 1005.0 AND double3 > 1000.0
SELECT SUM(JSON_VALUE(nested, '$.long1' RETURNING BIGINT)) FROM foo WHERE
JSON_VALUE(nested, '$.nesteder.double3' RETURNING DOUBLE) < 1005.0 AND
JSON_VALUE(nested, '$.nesteder.double3' RETURNING DOUBLE) > 1000.0
old:
(not previously measured)
new:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 26 5000000 false
avgt 5 74.495 ± 1.506 ms/op
SqlNestedDataBenchmark.querySql 26 5000000 force
avgt 5 48.270 ± 0.806 ms/op
SqlNestedDataBenchmark.querySql 27 5000000 false
avgt 5 12.997 ± 0.509 ms/op
SqlNestedDataBenchmark.querySql 27 5000000 force
avgt 5 13.094 ± 0.553 ms/op
SELECT SUM(long1) FROM foo WHERE double3 < 2000.0 AND double3 > 1000.0
SELECT SUM(JSON_VALUE(nested, '$.long1' RETURNING BIGINT)) FROM foo WHERE
JSON_VALUE(nested, '$.nesteder.double3' RETURNING DOUBLE) < 2000.0 AND
JSON_VALUE(nested, '$.nesteder.double3' RETURNING DOUBLE) > 1000.0
old:
(not previously measured)
new:
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 28 5000000 false
avgt 5 79.335 ± 2.919 ms/op
SqlNestedDataBenchmark.querySql 28 5000000 force
avgt 5 51.953 ± 1.056 ms/op
SqlNestedDataBenchmark.querySql 29 5000000 false
avgt 5 40.987 ± 0.710 ms/op
SqlNestedDataBenchmark.querySql 29 5000000 force
avgt 5 40.654 ± 0.713 ms/op
```
This PR has:
- [ ] been self-reviewed.
- [ ] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
(Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [ ] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [ ] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]