clintropolis commented on PR #12753:
URL: https://github.com/apache/druid/pull/12753#issuecomment-1183571908
I created some query benchmarks to see how various nested column queries
perform against traditional columns and see where things currently stand to
help guide which areas I will be investigating in the future. I'll add the
benchmark code in a follow-up PR. The nested columns are identical to the
traditional columns, created using a transform expression:
```
json_object('long1', long1, 'nesteder', json_object('string1', string1,
'long2', long2, 'double3',double3))
```
For the most part things are pretty similar as expected, and the ones that
aren't I will be digging deeper into. For numeric columns, which have indexes
in nested columns, the indexes in most cases make the queries for nested
columns faster, though in a few cases are currently slower. One of the double
columns with the bound filter, which part of the reason for that is that
currently there is no native numeric range index for the bound filter to use,
so bound filters on numbers have to use a `DruidPredicateIndex` which has to
match against every value in the dictionary instead of being able to take
short-cuts like `LexicographicalRangeIndex` does for strings. I'll be adding
this in a follow-up PR as well.
```
SELECT SUM(long1) FROM foo
SELECT SUM(JSON_VALUE(nested, '$.long1' RETURNING BIGINT)) FROM foo
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 0 5000000 false
avgt 5 36.711 ± 0.917 ms/op
SqlNestedDataBenchmark.querySql 0 5000000 force
avgt 5 15.587 ± 0.276 ms/op
SqlNestedDataBenchmark.querySql 1 5000000 false
avgt 5 39.224 ± 0.870 ms/op
SqlNestedDataBenchmark.querySql 1 5000000 force
avgt 5 15.877 ± 0.440 ms/op
SELECT SUM(long1), SUM(long2) FROM foo
SELECT SUM(JSON_VALUE(nested, '$.long1' RETURNING BIGINT)),
SUM(JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT)) FROM foo
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 2 5000000 false
avgt 5 63.805 ± 1.036 ms/op
SqlNestedDataBenchmark.querySql 2 5000000 force
avgt 5 30.381 ± 1.201 ms/op
SqlNestedDataBenchmark.querySql 3 5000000 false
avgt 5 66.660 ± 0.806 ms/op
SqlNestedDataBenchmark.querySql 3 5000000 force
avgt 5 30.341 ± 1.124 ms/op
SELECT SUM(long1), SUM(long2), SUM(double3) FROM foo
SELECT SUM(JSON_VALUE(nested, '$.long1' RETURNING BIGINT)),
SUM(JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT)),
SUM(JSON_VALUE(nested, '$.nesteder.double3' RETURNING DOUBLE)) FROM foo
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 4 5000000 false
avgt 5 78.570 ± 1.657 ms/op
SqlNestedDataBenchmark.querySql 4 5000000 force
avgt 5 37.777 ± 1.295 ms/op
SqlNestedDataBenchmark.querySql 5 5000000 false
avgt 5 82.672 ± 1.010 ms/op
SqlNestedDataBenchmark.querySql 5 5000000 force
avgt 5 37.887 ± 0.802 ms/op
SELECT string1, SUM(long1) FROM foo GROUP BY 1 ORDER BY 2,
SELECT JSON_VALUE(nested, '$.nesteder.string1'), SUM(JSON_VALUE(nested,
'$.long1' RETURNING BIGINT)) FROM foo GROUP BY 1 ORDER BY 2,
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 6 5000000 false
avgt 5 269.560 ± 1.454 ms/op
SqlNestedDataBenchmark.querySql 6 5000000 force
avgt 5 157.090 ± 4.058 ms/op
SqlNestedDataBenchmark.querySql 7 5000000 false
avgt 5 373.162 ± 2.871 ms/op
SqlNestedDataBenchmark.querySql 7 5000000 force
avgt 5 195.213 ± 1.993 ms/op
SELECT string1, SUM(long1), SUM(double3) FROM foo GROUP BY 1 ORDER BY 2
SELECT JSON_VALUE(nested, '$.nesteder.string1'), SUM(JSON_VALUE(nested,
'$.long1' RETURNING BIGINT)), SUM(JSON_VALUE(nested, '$.nesteder.double3'
RETURNING DOUBLE)) FROM foo GROUP BY 1 ORDER BY 2
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 8 5000000 false
avgt 5 251.743 ± 6.438 ms/op
SqlNestedDataBenchmark.querySql 8 5000000 force
avgt 5 172.322 ± 14.814 ms/op
SqlNestedDataBenchmark.querySql 9 5000000 false
avgt 5 417.454 ± 21.276 ms/op
SqlNestedDataBenchmark.querySql 9 5000000 force
avgt 5 215.228 ± 9.304 ms/op
SELECT SUM(long1) FROM foo WHERE string1 = '10000' OR string1 = '1000'
SELECT SUM(JSON_VALUE(nested, '$.long1' RETURNING BIGINT)) FROM foo WHERE
JSON_VALUE(nested, '$.nesteder.string1') = '10000' OR JSON_VALUE(nested,
'$.nesteder.string1') = '1000'
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 10 5000000 false
avgt 5 11.482 ± 0.495 ms/op
SqlNestedDataBenchmark.querySql 10 5000000 force
avgt 5 11.549 ± 0.303 ms/op
SqlNestedDataBenchmark.querySql 11 5000000 false
avgt 5 11.695 ± 0.293 ms/op
SqlNestedDataBenchmark.querySql 11 5000000 force
avgt 5 11.931 ± 0.338 ms/op
SELECT SUM(long1) FROM foo WHERE long2 = 10000 OR long2 = 1000
SELECT SUM(JSON_VALUE(nested, '$.long1' RETURNING BIGINT)) FROM foo WHERE
JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT) = 10000 OR
JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT) = 1000
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 12 5000000 false
avgt 5 78.895 ± 2.158 ms/op
SqlNestedDataBenchmark.querySql 12 5000000 force
avgt 5 48.814 ± 0.874 ms/op
SqlNestedDataBenchmark.querySql 13 5000000 false
avgt 5 1.297 ± 0.008 ms/op
SqlNestedDataBenchmark.querySql 13 5000000 force
avgt 5 1.277 ± 0.011 ms/op
SELECT SUM(long1) FROM foo WHERE double3 < 10000.0 AND double3 > 1000.0
SELECT SUM(JSON_VALUE(nested, '$.long1' RETURNING BIGINT)) FROM foo WHERE
JSON_VALUE(nested, '$.nesteder.double3' RETURNING DOUBLE) < 10000.0 AND
JSON_VALUE(nested, '$.nesteder.double3' RETURNING DOUBLE) > 1000.0
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 14 5000000 false
avgt 5 92.982 ± 1.473 ms/op
SqlNestedDataBenchmark.querySql 14 5000000 force
avgt 5 54.729 ± 0.429 ms/op
SqlNestedDataBenchmark.querySql 15 5000000 false
avgt 5 580.472 ± 28.064 ms/op
SqlNestedDataBenchmark.querySql 15 5000000 force
avgt 5 561.494 ± 54.096 ms/op
SELECT long1, SUM(double3) FROM foo WHERE string1 = '10000' OR string1 =
'1000' GROUP BY 1 ORDER BY 2
SELECT JSON_VALUE(nested, '$.long1' RETURNING BIGINT),
SUM(JSON_VALUE(nested, '$.nesteder.double3' RETURNING DOUBLE)) FROM foo WHERE
JSON_VALUE(nested, '$.nesteder.string1') = '10000' OR JSON_VALUE(nested,
'$.nesteder.string1') = '1000' GROUP BY 1 ORDER BY 2
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 16 5000000 false
avgt 5 129.760 ± 9.953 ms/op
SqlNestedDataBenchmark.querySql 16 5000000 force
avgt 5 133.015 ± 20.961 ms/op
SqlNestedDataBenchmark.querySql 17 5000000 false
avgt 5 142.197 ± 8.773 ms/op
SqlNestedDataBenchmark.querySql 17 5000000 force
avgt 5 132.048 ± 15.546 ms/op
SELECT string1, SUM(double3) FROM foo WHERE long2 < 10000 AND long2 > 1000
GROUP BY 1 ORDER BY 2
SELECT JSON_VALUE(nested, '$.nesteder.string1'), SUM(JSON_VALUE(nested,
'$.nesteder.double3' RETURNING DOUBLE)) FROM foo WHERE JSON_VALUE(nested,
'$.nesteder.long2' RETURNING BIGINT) < 10000 AND JSON_VALUE(nested,
'$.nesteder.long2' RETURNING BIGINT) > 1000 GROUP BY 1 ORDER BY 2
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 18 5000000 false
avgt 5 161.445 ± 7.024 ms/op
SqlNestedDataBenchmark.querySql 18 5000000 force
avgt 5 138.212 ± 19.673 ms/op
SqlNestedDataBenchmark.querySql 19 5000000 false
avgt 5 123.486 ± 5.029 ms/op
SqlNestedDataBenchmark.querySql 19 5000000 force
avgt 5 120.079 ± 6.822 ms/op
SELECT string1, SUM(double3) FROM foo WHERE double3 < 10000.0 AND double3 >
1000.0 GROUP BY 1 ORDER BY 2
SELECT JSON_VALUE(nested, '$.nesteder.string1'), SUM(JSON_VALUE(nested,
'$.nesteder.double3' RETURNING DOUBLE)) FROM foo WHERE JSON_VALUE(nested,
'$.nesteder.double3' RETURNING DOUBLE) < 10000.0 AND JSON_VALUE(nested,
'$.nesteder.double3' RETURNING DOUBLE) > 1000.0 GROUP BY 1 ORDER BY 2
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 20 5000000 false
avgt 5 280.393 ± 15.369 ms/op
SqlNestedDataBenchmark.querySql 20 5000000 force
avgt 5 174.545 ± 2.702 ms/op
SqlNestedDataBenchmark.querySql 21 5000000 false
avgt 5 802.647 ± 32.078 ms/op
SqlNestedDataBenchmark.querySql 21 5000000 force
avgt 5 591.274 ± 16.460 ms/op
SELECT long2 FROM foo WHERE long2 IN (1, 19, 21, 23, 25, 26, 46),
SELECT JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT) FROM foo
WHERE JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT) IN (1, 19, 21,
23, 25, 26, 46),
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 22 5000000 false
avgt 5 273.464 ± 15.731 ms/op
SqlNestedDataBenchmark.querySql 22 5000000 force
avgt 5 272.270 ± 20.511 ms/op
SqlNestedDataBenchmark.querySql 23 5000000 false
avgt 5 174.960 ± 1.923 ms/op
SqlNestedDataBenchmark.querySql 23 5000000 force
avgt 5 177.920 ± 4.095 ms/op
SELECT long2 FROM foo WHERE long2 IN (1, 19, 21, 23, 25, 26, 46) GROUP BY 1",
SELECT JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT) FROM foo
WHERE JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT) IN (1, 19, 21,
23, 25, 26, 46) GROUP BY 1
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 24 5000000 false
avgt 5 318.280 ± 7.544 ms/op
SqlNestedDataBenchmark.querySql 24 5000000 force
avgt 5 210.866 ± 14.684 ms/op
SqlNestedDataBenchmark.querySql 25 5000000 false
avgt 5 215.200 ± 2.366 ms/op
SqlNestedDataBenchmark.querySql 25 5000000 force
avgt 5 152.399 ± 22.695 ms/op
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]