ccaominh opened a new pull request #9278: Speed up joins on indexed tables with string keys URL: https://github.com/apache/druid/pull/9278 ### Description When joining on index tables with string keys, caching the computation of row id to row numbers improves performance on the `JoinAndLookupBenchmark.joinIndexTableStringKey`* benchmarks by about 10% if the column cache is enabled an by about 100% if the column cache is disabled. #### Before ``` Benchmark (columnCacheSizeBytes) Score Error Units joinIndexedTableStringKey 0 41.899 ± 0.688 ms/op joinIndexedTableStringKey 16384 22.707 ± 0.309 ms/op joinIndexedTableStringKeyWithFilter 0 41.879 ± 0.507 ms/op joinIndexedTableStringKeyWithFilter 16384 22.314 ± 0.114 ms/op ``` #### After ``` Benchmark (columnCacheSizeBytes) Score Error Units joinIndexedTableStringKey 0 20.527 ± 0.751 ms/op joinIndexedTableStringKey 16384 20.804 ± 0.206 ms/op joinIndexedTableStringKeyWithFilter 0 21.374 ± 0.299 ms/op joinIndexedTableStringKeyWithFilter 16384 19.723 ± 0.390 ms/op ``` (See https://github.com/apache/druid/pull/9267 for the `JoinAndLookupBenchmark` implementation.) <hr> This PR has: - [x] been self-reviewed. - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
