jasonk000 opened a new pull request, #13710:
URL: https://github.com/apache/druid/pull/13710

   ### Description
   
   Following from https://github.com/apache/druid/pull/12105 with 1-20% CPU in 
`DimensionDictionary`, there are some improvements to be made to the `add()` 
path on the `StringDimensionIndexer`. The main concern is that the locking is 
more complicated than necessary.
   
   There are three sub-changes:
   - Introduce a benchmark
   - Swap the existing lock implementation to use a `synchronized` block
   - Reduce the number of times we have to cross the lock.
   
   Estimate this code reduces CPU by ~5-7% on our indexing jobs overall, with 
~30% improvements to the uncontended path.
   
   ### Alternative solutions / future work
   
   It seems like this code is mostly used only one-at-a-time, so an alternative 
path that changes the DimensionDictionary would be to move responsibility for 
synchronization _outside_ of the dictionary and make the caller responsible for 
synchronization. This is likely to see a good boost to the later 
compareUnsorted stage but might constrain future work. Specifically it would 
prevent concurrent use of the dictionary from multiple threads during ingestion.
   
   ### Related
   
   #12105, #12109, 
   
   ##### Key changed/added classes in this PR
    * Introduce benchmark `StringDimensionIndexerProcessBenchmark`
    * `DimensionDictionary` locking changes
    * `StringDimensionIndexer` simplification of call
   
   <hr>
   
   This PR has:
   
   - [x] been self-reviewed.
      - [x] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [x] been tested in a test Druid cluster. -- a solution more like 12105 on 
top of 0.20 is currently running in Production for many months.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to