gianm opened a new pull request #9314: Add HashVectorGrouper based on MemoryOpenHashTable. URL: https://github.com/apache/druid/pull/9314 This causes vectorized hash-based groupBys to use MemoryOpenHashTable instead of ByteBufferHashTable (see #9308). Additional supporting changes: 1) Modifies VectorGrouper interface to use Memory instead of ByteBuffers. 2) Modifies BufferArrayGrouper to match the new VectorGrouper interface. 3) Removes "implements VectorGrouper" from BufferHashGrouper. Benchmarks: ``` master @ a085685182d62e5dd1b716f1bbb9bcbbaeb1c661 Benchmark (query) (rowsPerSegment) (vectorize) Mode Cnt Score Error Units SqlBenchmark.querySql 10 5000000 force avgt 25 683.263 ± 8.382 ms/op SqlBenchmark.querySql 11 5000000 force avgt 25 691.823 ± 7.979 ms/op SqlBenchmark.querySql 12 5000000 force avgt 25 34.977 ± 0.842 ms/op SqlBenchmark.querySql 13 5000000 force avgt 25 42.408 ± 0.984 ms/op SqlBenchmark.querySql 14 5000000 force avgt 25 98.015 ± 0.887 ms/op SqlBenchmark.querySql 15 5000000 force avgt 25 449.379 ± 6.541 ms/op SqlBenchmark.querySql 16 5000000 force avgt 25 519.083 ± 5.115 ms/op SqlBenchmark.querySql 17 5000000 force avgt 25 456.992 ± 4.015 ms/op SqlBenchmark.querySql 18 5000000 force avgt 25 534.258 ± 8.832 ms/op groupby-hvg @ 9fdc4bcf8e6dd6896a5fe687145e5ea0f1c6a709 Benchmark (query) (rowsPerSegment) (vectorize) Mode Cnt Score Error Units SqlBenchmark.querySql 10 5000000 force avgt 25 450.345 ± 6.428 ms/op SqlBenchmark.querySql 11 5000000 force avgt 25 473.163 ± 4.918 ms/op SqlBenchmark.querySql 12 5000000 force avgt 25 32.281 ± 0.566 ms/op SqlBenchmark.querySql 13 5000000 force avgt 25 39.259 ± 1.300 ms/op SqlBenchmark.querySql 14 5000000 force avgt 25 91.553 ± 0.892 ms/op SqlBenchmark.querySql 15 5000000 force avgt 25 85.417 ± 0.670 ms/op SqlBenchmark.querySql 16 5000000 force avgt 25 158.197 ± 1.458 ms/op SqlBenchmark.querySql 17 5000000 force avgt 25 103.445 ± 0.872 ms/op SqlBenchmark.querySql 18 5000000 force avgt 25 175.661 ± 1.859 ms/op ``` The largest improvements are on queries 10 and 11 (~50%) and queries 15–18 (3–5x). Queries 12–14 look very slightly faster or maybe unchanged. Queries 10 & 11 are groupBys on two strings and would use hash-based grouping. Queries 12–14 are groupBys on one string and would use array-based grouping. Queries 15–18 are groupBys on one long column and would use hash-based grouping.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
