clintropolis commented on PR #12753:
URL: https://github.com/apache/druid/pull/12753#issuecomment-1183571908

   I created some query benchmarks to see how various nested column queries 
perform against traditional columns and see where things currently stand to 
help guide which areas I will be investigating in the future. I'll add the 
benchmark code in a follow-up PR. The nested columns are identical to the 
traditional columns, created using a transform expression:
   ```
   json_object('long1', long1, 'nesteder', json_object('string1', string1, 
'long2', long2, 'double3',double3))
   ```
   
   For the most part things are pretty similar as expected, and the ones that 
aren't I will be digging deeper into. For numeric columns, which have indexes 
in nested columns, the indexes in most cases make the queries for nested 
columns faster, though in a few cases are currently slower. One of the double 
columns with the bound filter, which part of the reason for that is that 
currently there is no native numeric range index for the bound filter to use, 
so bound filters on numbers have to use a `DruidPredicateIndex` which has to 
match against every value in the dictionary instead of being able to take 
short-cuts like `LexicographicalRangeIndex` does for strings. I'll be adding 
this in a follow-up PR as well.
   
   ```
   SELECT SUM(long1) FROM foo
   SELECT SUM(JSON_VALUE(nested, '$.long1' RETURNING BIGINT)) FROM foo
   Benchmark                        (query)  (rowsPerSegment)  (vectorize)  
Mode  Cnt    Score    Error  Units
   SqlNestedDataBenchmark.querySql        0           5000000        false  
avgt    5   36.711 ±  0.917  ms/op
   SqlNestedDataBenchmark.querySql        0           5000000        force  
avgt    5   15.587 ±  0.276  ms/op
   SqlNestedDataBenchmark.querySql        1           5000000        false  
avgt    5   39.224 ±  0.870  ms/op
   SqlNestedDataBenchmark.querySql        1           5000000        force  
avgt    5   15.877 ±  0.440  ms/op
   
   
   SELECT SUM(long1), SUM(long2) FROM foo
   SELECT SUM(JSON_VALUE(nested, '$.long1' RETURNING BIGINT)), 
SUM(JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT)) FROM foo
   Benchmark                        (query)  (rowsPerSegment)  (vectorize)  
Mode  Cnt    Score    Error  Units
   SqlNestedDataBenchmark.querySql        2           5000000        false  
avgt    5   63.805 ±  1.036  ms/op
   SqlNestedDataBenchmark.querySql        2           5000000        force  
avgt    5   30.381 ±  1.201  ms/op
   SqlNestedDataBenchmark.querySql        3           5000000        false  
avgt    5   66.660 ±  0.806  ms/op
   SqlNestedDataBenchmark.querySql        3           5000000        force  
avgt    5   30.341 ±  1.124  ms/op
   
   
   SELECT SUM(long1), SUM(long2), SUM(double3) FROM foo
   SELECT SUM(JSON_VALUE(nested, '$.long1' RETURNING BIGINT)), 
SUM(JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT)), 
SUM(JSON_VALUE(nested, '$.nesteder.double3' RETURNING DOUBLE)) FROM foo
   Benchmark                        (query)  (rowsPerSegment)  (vectorize)  
Mode  Cnt    Score    Error  Units
   SqlNestedDataBenchmark.querySql        4           5000000        false  
avgt    5   78.570 ±  1.657  ms/op
   SqlNestedDataBenchmark.querySql        4           5000000        force  
avgt    5   37.777 ±  1.295  ms/op
   SqlNestedDataBenchmark.querySql        5           5000000        false  
avgt    5   82.672 ±  1.010  ms/op
   SqlNestedDataBenchmark.querySql        5           5000000        force  
avgt    5   37.887 ±  0.802  ms/op
   
   
   SELECT string1, SUM(long1) FROM foo GROUP BY 1 ORDER BY 2,
   SELECT JSON_VALUE(nested, '$.nesteder.string1'), SUM(JSON_VALUE(nested, 
'$.long1' RETURNING BIGINT)) FROM foo GROUP BY 1 ORDER BY 2,
   Benchmark                        (query)  (rowsPerSegment)  (vectorize)  
Mode  Cnt    Score    Error  Units
   SqlNestedDataBenchmark.querySql        6           5000000        false  
avgt    5  269.560 ±  1.454  ms/op
   SqlNestedDataBenchmark.querySql        6           5000000        force  
avgt    5  157.090 ±  4.058  ms/op
   SqlNestedDataBenchmark.querySql        7           5000000        false  
avgt    5  373.162 ±  2.871  ms/op
   SqlNestedDataBenchmark.querySql        7           5000000        force  
avgt    5  195.213 ±  1.993  ms/op
   
   
   SELECT string1, SUM(long1), SUM(double3) FROM foo GROUP BY 1 ORDER BY 2
   SELECT JSON_VALUE(nested, '$.nesteder.string1'), SUM(JSON_VALUE(nested, 
'$.long1' RETURNING BIGINT)), SUM(JSON_VALUE(nested, '$.nesteder.double3' 
RETURNING DOUBLE)) FROM foo GROUP BY 1 ORDER BY 2
   Benchmark                        (query)  (rowsPerSegment)  (vectorize)  
Mode  Cnt    Score    Error  Units
   SqlNestedDataBenchmark.querySql        8           5000000        false  
avgt    5  251.743 ±  6.438  ms/op
   SqlNestedDataBenchmark.querySql        8           5000000        force  
avgt    5  172.322 ± 14.814  ms/op
   SqlNestedDataBenchmark.querySql        9           5000000        false  
avgt    5  417.454 ± 21.276  ms/op
   SqlNestedDataBenchmark.querySql        9           5000000        force  
avgt    5  215.228 ±  9.304  ms/op
   
   
   SELECT SUM(long1) FROM foo WHERE string1 = '10000' OR string1 = '1000'
   SELECT SUM(JSON_VALUE(nested, '$.long1' RETURNING BIGINT)) FROM foo WHERE 
JSON_VALUE(nested, '$.nesteder.string1') = '10000' OR JSON_VALUE(nested, 
'$.nesteder.string1') = '1000'
   Benchmark                        (query)  (rowsPerSegment)  (vectorize)  
Mode  Cnt    Score    Error  Units
   SqlNestedDataBenchmark.querySql       10           5000000        false  
avgt    5   11.482 ±  0.495  ms/op
   SqlNestedDataBenchmark.querySql       10           5000000        force  
avgt    5   11.549 ±  0.303  ms/op
   SqlNestedDataBenchmark.querySql       11           5000000        false  
avgt    5   11.695 ±  0.293  ms/op
   SqlNestedDataBenchmark.querySql       11           5000000        force  
avgt    5   11.931 ±  0.338  ms/op
   
   
   SELECT SUM(long1) FROM foo WHERE long2 = 10000 OR long2 = 1000
   SELECT SUM(JSON_VALUE(nested, '$.long1' RETURNING BIGINT)) FROM foo WHERE 
JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT) = 10000 OR 
JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT) = 1000
   Benchmark                        (query)  (rowsPerSegment)  (vectorize)  
Mode  Cnt    Score    Error  Units
   SqlNestedDataBenchmark.querySql       12           5000000        false  
avgt    5   78.895 ±  2.158  ms/op
   SqlNestedDataBenchmark.querySql       12           5000000        force  
avgt    5   48.814 ±  0.874  ms/op
   SqlNestedDataBenchmark.querySql       13           5000000        false  
avgt    5    1.297 ±  0.008  ms/op
   SqlNestedDataBenchmark.querySql       13           5000000        force  
avgt    5    1.277 ±  0.011  ms/op
   
   
   SELECT SUM(long1) FROM foo WHERE double3 < 10000.0 AND double3 > 1000.0
   SELECT SUM(JSON_VALUE(nested, '$.long1' RETURNING BIGINT)) FROM foo WHERE 
JSON_VALUE(nested, '$.nesteder.double3' RETURNING DOUBLE) < 10000.0 AND 
JSON_VALUE(nested, '$.nesteder.double3' RETURNING DOUBLE) > 1000.0
   Benchmark                        (query)  (rowsPerSegment)  (vectorize)  
Mode  Cnt    Score    Error  Units
   SqlNestedDataBenchmark.querySql       14           5000000        false  
avgt    5   92.982 ±  1.473  ms/op
   SqlNestedDataBenchmark.querySql       14           5000000        force  
avgt    5   54.729 ±  0.429  ms/op
   SqlNestedDataBenchmark.querySql       15           5000000        false  
avgt    5  580.472 ± 28.064  ms/op
   SqlNestedDataBenchmark.querySql       15           5000000        force  
avgt    5  561.494 ± 54.096  ms/op
   
   
   SELECT long1, SUM(double3) FROM foo WHERE string1 = '10000' OR string1 = 
'1000' GROUP BY 1 ORDER BY 2
   SELECT JSON_VALUE(nested, '$.long1' RETURNING BIGINT), 
SUM(JSON_VALUE(nested, '$.nesteder.double3' RETURNING DOUBLE)) FROM foo WHERE 
JSON_VALUE(nested, '$.nesteder.string1') = '10000' OR JSON_VALUE(nested, 
'$.nesteder.string1') = '1000' GROUP BY 1 ORDER BY 2
   Benchmark                        (query)  (rowsPerSegment)  (vectorize)  
Mode  Cnt    Score    Error  Units
   SqlNestedDataBenchmark.querySql       16           5000000        false  
avgt    5  129.760 ±  9.953  ms/op
   SqlNestedDataBenchmark.querySql       16           5000000        force  
avgt    5  133.015 ± 20.961  ms/op
   SqlNestedDataBenchmark.querySql       17           5000000        false  
avgt    5  142.197 ±  8.773  ms/op
   SqlNestedDataBenchmark.querySql       17           5000000        force  
avgt    5  132.048 ± 15.546  ms/op
   
   
   SELECT string1, SUM(double3) FROM foo WHERE long2 < 10000 AND long2 > 1000 
GROUP BY 1 ORDER BY 2
   SELECT JSON_VALUE(nested, '$.nesteder.string1'), SUM(JSON_VALUE(nested, 
'$.nesteder.double3' RETURNING DOUBLE)) FROM foo WHERE JSON_VALUE(nested, 
'$.nesteder.long2' RETURNING BIGINT) < 10000 AND JSON_VALUE(nested, 
'$.nesteder.long2' RETURNING BIGINT) > 1000 GROUP BY 1 ORDER BY 2
   Benchmark                        (query)  (rowsPerSegment)  (vectorize)  
Mode  Cnt    Score    Error  Units
   SqlNestedDataBenchmark.querySql       18           5000000        false  
avgt    5  161.445 ±  7.024  ms/op
   SqlNestedDataBenchmark.querySql       18           5000000        force  
avgt    5  138.212 ± 19.673  ms/op
   SqlNestedDataBenchmark.querySql       19           5000000        false  
avgt    5  123.486 ±  5.029  ms/op
   SqlNestedDataBenchmark.querySql       19           5000000        force  
avgt    5  120.079 ±  6.822  ms/op
   
   
   SELECT string1, SUM(double3) FROM foo WHERE double3 < 10000.0 AND double3 > 
1000.0 GROUP BY 1 ORDER BY 2
   SELECT JSON_VALUE(nested, '$.nesteder.string1'), SUM(JSON_VALUE(nested, 
'$.nesteder.double3' RETURNING DOUBLE)) FROM foo WHERE JSON_VALUE(nested, 
'$.nesteder.double3' RETURNING DOUBLE) < 10000.0 AND JSON_VALUE(nested, 
'$.nesteder.double3' RETURNING DOUBLE) > 1000.0 GROUP BY 1 ORDER BY 2
   Benchmark                        (query)  (rowsPerSegment)  (vectorize)  
Mode  Cnt    Score    Error  Units
   SqlNestedDataBenchmark.querySql       20           5000000        false  
avgt    5  280.393 ± 15.369  ms/op
   SqlNestedDataBenchmark.querySql       20           5000000        force  
avgt    5  174.545 ±  2.702  ms/op
   SqlNestedDataBenchmark.querySql       21           5000000        false  
avgt    5  802.647 ± 32.078  ms/op
   SqlNestedDataBenchmark.querySql       21           5000000        force  
avgt    5  591.274 ± 16.460  ms/op
   
   
   SELECT long2 FROM foo WHERE long2 IN (1, 19, 21, 23, 25, 26, 46),
   SELECT JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT) FROM foo 
WHERE JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT) IN (1, 19, 21, 
23, 25, 26, 46),
   Benchmark                        (query)  (rowsPerSegment)  (vectorize)  
Mode  Cnt    Score    Error  Units
   SqlNestedDataBenchmark.querySql       22           5000000        false  
avgt    5  273.464 ± 15.731  ms/op
   SqlNestedDataBenchmark.querySql       22           5000000        force  
avgt    5  272.270 ± 20.511  ms/op
   SqlNestedDataBenchmark.querySql       23           5000000        false  
avgt    5  174.960 ±  1.923  ms/op
   SqlNestedDataBenchmark.querySql       23           5000000        force  
avgt    5  177.920 ±  4.095  ms/op
   
   
   SELECT long2 FROM foo WHERE long2 IN (1, 19, 21, 23, 25, 26, 46) GROUP BY 1",
   SELECT JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT) FROM foo 
WHERE JSON_VALUE(nested, '$.nesteder.long2' RETURNING BIGINT) IN (1, 19, 21, 
23, 25, 26, 46) GROUP BY 1
   Benchmark                        (query)  (rowsPerSegment)  (vectorize)  
Mode  Cnt    Score    Error  Units
   SqlNestedDataBenchmark.querySql       24           5000000        false  
avgt    5  318.280 ±  7.544  ms/op
   SqlNestedDataBenchmark.querySql       24           5000000        force  
avgt    5  210.866 ± 14.684  ms/op
   SqlNestedDataBenchmark.querySql       25           5000000        false  
avgt    5  215.200 ±  2.366  ms/op
   SqlNestedDataBenchmark.querySql       25           5000000        force  
avgt    5  152.399 ± 22.695  ms/op
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to