gianm opened a new pull request, #15697:
URL: https://github.com/apache/druid/pull/15697

   Following #14866, there is no longer a reason for IncrementalIndex#add to be 
thread-safe.
   
   It turns out it already was not using its selectors in a thread-safe way, as 
exposed by #15615 making `testMultithreadAddFactsUsingExpressionAndJavaScript` 
in `IncrementalIndexIngestionTest` flaky. Note that this problem isn't new: 
Strings have been stored in the dimension selectors for some time, but we 
didn't have a test that checked for that case; we only have this test that 
checks for concurrent adds involving numeric selectors.
   
   At any rate, this patch changes OnheapIncrementalIndex to no longer try to 
offer a thread-safe "add" method. It also improves performance a bit by adding 
a row ID supplier to the selectors it uses to read InputRows, meaning that it 
can get the benefit of caching values inside the selectors.
   
   This patch also:
   
   1) Adds synchronization to HyperUniquesAggregator and CardinalityAggregator,
      which the similar datasketches versions already have. This is done to
      help them adhere to the contract of Aggregator: concurrent calls to
      "aggregate" and "get" must be thread-safe.
   
   2) Updates OnHeapIncrementalIndexBenchmark to use JMH and moves it to the
      druid-benchmarks module.
   
   Benchmarks below. I think the speedup is mainly due to the additional 
caching enabled by the row ID supplier. The benchmark involves two primitive 
aggregators reading the same input fields, which would now be able to make use 
of in-selector caching.
   
   ```
   patch
   
   Benchmark                                          Mode  Cnt     Score    
Error  Units
   OnheapIncrementalIndexBenchmark.concurrentAddRead  avgt    5  1805.434 ± 
51.638  ms/op
   
   master
   
   Benchmark                                          Mode  Cnt     Score    
Error  Units
   OnheapIncrementalIndexBenchmark.concurrentAddRead  avgt    5  2049.733 ± 
21.237  ms/op
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to