eranmeir opened a new pull request #7838: Improve IncrementalIndex concurrency scalability URL: https://github.com/apache/incubator-druid/pull/7838 ### Background Our work on Oak (see PR #7676), shows that there are significant performance gains with multi-threaded indexing (even when not using Oak). In our benchmarks we noticed that ingestion was not scaling as expected with multiple threads. We traced the threads’ blocking states to two causes: 1. A monitor in `IncrementalIndex` that synchronized access to `dimensionDescs` 2. A Read-Write lock in `StringDimensionIndexer` This PR proposes a solution to the first issue. The proposed solution is based on the observation that dimension data is updated infrequently and so ongoing exclusive locking is wasteful. ### Summary of changes - Shared state is encapsulated in a new class - `DimensionData`. This includes `dimensionDescs`, `dimensionDescsList` and `columnCapabilities` - Concurrent threads share an atomic reference to an instance of `DimensionData` - CoW: Only when a thread needs to update the shared state, it will copy the instance, update the copy, and eventually swap the reference atomically. - Consistency is maintained when the reference is updated. This simplifies row processing, removes the need for keeping an “overflow” array, and allows fast failure when a row contains duplicate dimensions. - New multi-threaded ingestion benchmark: `IndexIngestionMultithreadedBenchmark` For benchmark results see attached document: [Incremental Index Scaling.pdf](https://github.com/apache/incubator-druid/files/3261231/Incremental.Index.Scaling.pdf)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
