gianm opened a new pull request, #15697:
URL: https://github.com/apache/druid/pull/15697
Following #14866, there is no longer a reason for IncrementalIndex#add to be
thread-safe.
It turns out it already was not using its selectors in a thread-safe way, as
exposed by #15615 making `testMultithreadAddFactsUsingExpressionAndJavaScript`
in `IncrementalIndexIngestionTest` flaky. Note that this problem isn't new:
Strings have been stored in the dimension selectors for some time, but we
didn't have a test that checked for that case; we only have this test that
checks for concurrent adds involving numeric selectors.
At any rate, this patch changes OnheapIncrementalIndex to no longer try to
offer a thread-safe "add" method. It also improves performance a bit by adding
a row ID supplier to the selectors it uses to read InputRows, meaning that it
can get the benefit of caching values inside the selectors.
This patch also:
1) Adds synchronization to HyperUniquesAggregator and CardinalityAggregator,
which the similar datasketches versions already have. This is done to
help them adhere to the contract of Aggregator: concurrent calls to
"aggregate" and "get" must be thread-safe.
2) Updates OnHeapIncrementalIndexBenchmark to use JMH and moves it to the
druid-benchmarks module.
Benchmarks below. I think the speedup is mainly due to the additional
caching enabled by the row ID supplier. The benchmark involves two primitive
aggregators reading the same input fields, which would now be able to make use
of in-selector caching.
```
patch
Benchmark Mode Cnt Score
Error Units
OnheapIncrementalIndexBenchmark.concurrentAddRead avgt 5 1805.434 ±
51.638 ms/op
master
Benchmark Mode Cnt Score
Error Units
OnheapIncrementalIndexBenchmark.concurrentAddRead avgt 5 2049.733 ±
21.237 ms/op
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]