jasonk000 opened a new pull request, #13710: URL: https://github.com/apache/druid/pull/13710
### Description Following from https://github.com/apache/druid/pull/12105 with 1-20% CPU in `DimensionDictionary`, there are some improvements to be made to the `add()` path on the `StringDimensionIndexer`. The main concern is that the locking is more complicated than necessary. There are three sub-changes: - Introduce a benchmark - Swap the existing lock implementation to use a `synchronized` block - Reduce the number of times we have to cross the lock. Estimate this code reduces CPU by ~5-7% on our indexing jobs overall, with ~30% improvements to the uncontended path. ### Alternative solutions / future work It seems like this code is mostly used only one-at-a-time, so an alternative path that changes the DimensionDictionary would be to move responsibility for synchronization _outside_ of the dictionary and make the caller responsible for synchronization. This is likely to see a good boost to the later compareUnsorted stage but might constrain future work. Specifically it would prevent concurrent use of the dictionary from multiple threads during ingestion. ### Related #12105, #12109, ##### Key changed/added classes in this PR * Introduce benchmark `StringDimensionIndexerProcessBenchmark` * `DimensionDictionary` locking changes * `StringDimensionIndexer` simplification of call <hr> This PR has: - [x] been self-reviewed. - [x] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.) - [x] been tested in a test Druid cluster. -- a solution more like 12105 on top of 0.20 is currently running in Production for many months. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
