richardstartin opened a new pull request #8195: URL: https://github.com/apache/pinot/pull/8195
This change is motivated by slow queries at one of our customers which group by a raw column, where 30GB was seen to be allocated by `NoDictionaryMultiColumnGroupKeyGenerator.generateKeyForBlock`, which is also where most of the method samples were taken: <img width="1590" alt="Screenshot 2022-02-11 at 18 33 25" src="https://user-images.githubusercontent.com/16439049/153649438-1d8a054a-5bc1-4313-ac08-6faf6b5a41e1.png"> <img width="1602" alt="Screenshot 2022-02-11 at 18 35 21" src="https://user-images.githubusercontent.com/16439049/153649797-f73bd1aa-233f-4e57-8007-ff38e417e14b.png"> This PR starts by generalising one of our pre-existing benchmarks which does a good job of exercising the entire query execution. It is parameterised so different queries can be added easily, and the generated data is parameterised too so that columns with different cardinalities can be created. Then, the actual improvement is made in the second commit. It transposes the group key generation since the `BlockValSet`s will be cached by `DataBlockCache` anyway, then accumulates keys into a flyweight, which only needs to be allocated to memoize the group key on its first occurrence. This roughly halves average time and reduces allocation by at least a factor of 4: ``` Benchmark (_numRows) (_query) (_scenario) Mode Cnt Score Error Units BenchmarkQueries.query 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.001) avgt 5 200.573 ± 36.577 ms/op BenchmarkQueries.query:·gc.alloc.rate 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.001) avgt 5 454.459 ± 985.590 MB/sec BenchmarkQueries.query:·gc.alloc.rate.norm 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.001) avgt 5 139180218.880 ± 299249329.376 B/op BenchmarkQueries.query:·gc.churn.G1_Eden_Space 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.001) avgt 5 510.589 ± 414.979 MB/sec BenchmarkQueries.query:·gc.churn.G1_Eden_Space.norm 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.001) avgt 5 156957846.187 ± 109973654.666 B/op BenchmarkQueries.query:·gc.churn.G1_Old_Gen 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.001) avgt 5 12.236 ± 42.494 MB/sec BenchmarkQueries.query:·gc.churn.G1_Old_Gen.norm 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.001) avgt 5 3732222.293 ± 12807297.981 B/op BenchmarkQueries.query:·gc.churn.G1_Survivor_Space 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.001) avgt 5 4.412 ± 19.484 MB/sec BenchmarkQueries.query:·gc.churn.G1_Survivor_Space.norm 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.001) avgt 5 1398101.333 ± 6240670.451 B/op BenchmarkQueries.query:·gc.count 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.001) avgt 5 8.000 counts BenchmarkQueries.query:·gc.time 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.001) avgt 5 407.000 ms BenchmarkQueries.query 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.5) avgt 5 98.663 ± 7.845 ms/op BenchmarkQueries.query:·gc.alloc.rate 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.5) avgt 5 696.114 ± 1498.561 MB/sec BenchmarkQueries.query:·gc.alloc.rate.norm 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.5) avgt 5 106449429.440 ± 228808237.226 B/op BenchmarkQueries.query:·gc.churn.G1_Eden_Space 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.5) avgt 5 1029.174 ± 2217.348 MB/sec BenchmarkQueries.query:·gc.churn.G1_Eden_Space.norm 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.5) avgt 5 157963208.145 ± 341415795.326 B/op BenchmarkQueries.query:·gc.churn.G1_Old_Gen 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.5) avgt 5 0.109 ± 0.714 MB/sec BenchmarkQueries.query:·gc.churn.G1_Old_Gen.norm 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.5) avgt 5 16481.891 ± 107743.719 B/op BenchmarkQueries.query:·gc.count 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.5) avgt 5 4.000 counts BenchmarkQueries.query:·gc.time 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.5) avgt 5 10.000 ms BenchmarkQueries.query 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.999) avgt 5 90.816 ± 8.115 ms/op BenchmarkQueries.query:·gc.alloc.rate 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.999) avgt 5 752.671 ± 1621.248 MB/sec BenchmarkQueries.query:·gc.alloc.rate.norm 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.999) avgt 5 106393285.309 ± 228688298.313 B/op BenchmarkQueries.query:·gc.churn.G1_Eden_Space 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.999) avgt 5 1022.984 ± 2203.794 MB/sec BenchmarkQueries.query:·gc.churn.G1_Eden_Space.norm 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.999) avgt 5 143076606.448 ± 309117349.271 B/op BenchmarkQueries.query:·gc.churn.G1_Old_Gen 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.999) avgt 5 0.150 ± 0.832 MB/sec BenchmarkQueries.query:·gc.churn.G1_Old_Gen.norm 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.999) avgt 5 21384.158 ± 119911.555 B/op BenchmarkQueries.query:·gc.churn.G1_Survivor_Space 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.999) avgt 5 0.126 ± 1.082 MB/sec BenchmarkQueries.query:·gc.churn.G1_Survivor_Space.norm 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.999) avgt 5 17476.267 ± 150475.927 B/op BenchmarkQueries.query:·gc.count 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.999) avgt 5 4.000 counts BenchmarkQueries.query:·gc.time 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.999) avgt 5 52.000 ms ``` ``` Benchmark (_numRows) (_query) (_scenario) Mode Cnt Score Error Units BenchmarkQueries.query 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.001) avgt 5 130.071 ± 5.744 ms/op BenchmarkQueries.query:·gc.alloc.rate 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.001) avgt 5 197.775 ± 424.170 MB/sec BenchmarkQueries.query:·gc.alloc.rate.norm 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.001) avgt 5 39989639.600 ± 85754314.959 B/op BenchmarkQueries.query:·gc.churn.G1_Eden_Space 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.001) avgt 5 210.675 ± 161.001 MB/sec BenchmarkQueries.query:·gc.churn.G1_Eden_Space.norm 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.001) avgt 5 42677043.200 ± 33405655.687 B/op BenchmarkQueries.query:·gc.churn.G1_Old_Gen 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.001) avgt 5 0.835 ± 6.634 MB/sec BenchmarkQueries.query:·gc.churn.G1_Old_Gen.norm 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.001) avgt 5 167488.000 ± 1331249.740 B/op BenchmarkQueries.query:·gc.count 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.001) avgt 5 11.000 counts BenchmarkQueries.query:·gc.time 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.001) avgt 5 268.000 ms BenchmarkQueries.query 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.5) avgt 5 54.864 ± 4.432 ms/op BenchmarkQueries.query:·gc.alloc.rate 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.5) avgt 5 10.390 ± 18.671 MB/sec BenchmarkQueries.query:·gc.alloc.rate.norm 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.5) avgt 5 883504.473 ± 1574108.609 B/op BenchmarkQueries.query:·gc.count 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.5) avgt 5 ≈ 0 counts BenchmarkQueries.query 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.999) avgt 5 47.429 ± 2.367 ms/op BenchmarkQueries.query:·gc.alloc.rate 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.999) avgt 5 10.961 ± 19.191 MB/sec BenchmarkQueries.query:·gc.alloc.rate.norm 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.999) avgt 5 811307.408 ± 1420111.422 B/op BenchmarkQueries.query:·gc.count 1500000 SELECT RAW_INT_COL,INT_COL,COUNT(*) FROM MyTable GROUP BY RAW_INT_COL,INT_COL EXP(0.999) avgt 5 ≈ 0 counts ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
