eranmeir opened a new pull request #7838: Improve IncrementalIndex concurrency 
scalability
URL: https://github.com/apache/incubator-druid/pull/7838
 
 
   ### Background
   Our work on Oak (see PR #7676), shows that there are significant performance 
gains with multi-threaded indexing (even when not using Oak). In our benchmarks 
we noticed that ingestion was not scaling as expected with multiple threads.
   
   We traced the threads’ blocking states to two causes:
   1. A monitor in `IncrementalIndex` that synchronized access to 
`dimensionDescs`
   2. A Read-Write lock in `StringDimensionIndexer`
   
   This PR proposes a solution to the first issue. The proposed solution is 
based on the observation that dimension data is updated infrequently and so 
ongoing exclusive locking is wasteful.
   
   ### Summary of changes
   - Shared state is encapsulated in a new class - `DimensionData`. This 
includes `dimensionDescs`, `dimensionDescsList` and `columnCapabilities`
   - Concurrent threads share an atomic reference to an instance of 
`DimensionData`
   - CoW: Only when a thread needs to update the shared state, it will copy the 
instance, update the copy, and eventually swap the reference atomically.
   - Consistency is maintained when the reference is updated. This simplifies 
row processing, removes the need for keeping an “overflow” array, and allows 
fast failure when a row contains duplicate dimensions.
   - New multi-threaded ingestion benchmark: 
`IndexIngestionMultithreadedBenchmark`
   
   
   For benchmark results see attached document: 
   [Incremental Index 
Scaling.pdf](https://github.com/apache/incubator-druid/files/3261231/Incremental.Index.Scaling.pdf)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to