jihoonson opened a new pull request #11457: URL: https://github.com/apache/druid/pull/11457
### Description A concurrency issue is recently found between `DruidSchema` and `BrokerServerView`, which is, refreshing a `DruidTable` in `DruidSchema` can block query processing. This can happen in the scenario described below. 1. The `DruidSchema-Cache` thread locks `DruidSchema#lock` and calls `DruidSchema.buildDruidTable()` to rebuild the rowSignature of druidTable. 2. A new segment is added to `BrokerServerView`. The `BrokerServerView` thread will lock `BrokerServerView#lock`, process the new segment added, and call timelineCallbacks of `DruidSchema`. 3. The timeline callback of `DruidSchema` is executed by the same thread used in the step 2. This thread will wait for `DruidSchema.buildDruidTable()` in the step 1 to be done and `DruidSchema#lock` is released. 4. A new query is issued and the query thread will call `BrokerServerView.getTimeline()`. This call will wait for the timelineCallbacks to be done in the step 2 and `BrokerServerView#lock` is released. The following flame graphs show what those threads were doing when this happened in our cluster. The metrics for these flame graphs were collected for 30 seconds.  `DruidSchema` was calling `refreshSegmentsForDataSource()` and `buildDruidTable()`. `refreshSegmentsForDataSource` issues a `SegmentMetadataQuery` per segment and locks `DruidSchema#lock` per row in the result to update segment metadata in memory. `buildDruidTable` locks `DruidSchema#lock` _while it iterates all columns in all segments_ in the datasource. When the datasource has 481200 segments and 774 columns in each segment, `buildDruidTable` took about 25 seconds (!!) on my desktop.  `BrokerServerView` was blocked in the `DruidSchema.addSegment` callback.  Finally, there were 2 timeseries queries that were blocked in `BrokerServerView.getTimeline()` for 30 seconds. Currently, this can happen whenever `DruidSchema` needs to refresh `DruidTable`, which is whenever a new segment is added to the cluster or a segment is completely removed from the cluster. Moving segments should not cause this issue because a new segment is always loaded first in the new server before it is removed from the previous server. As a result, moving segments does not require to update the `RowSignature` of `DruidTable`. To fix this issue, this PR improves the concurrency of `DruidSchema` by not holding `DruidSchema#lock` to process expensive operations such as refreshing `DruidTables`. - A new thread, `DruidSchema-Callback`, is added in `DruidSchema` to asynchronously process the timeline callbacks. - The `segmentMetadataInfo` map is changed to `ConcurrentHashMap<String, ConcurrentSkipListMap<SegmentId, AvailableSegmentMetadata>>`, so that updating the map doesn't have to lock `DruidSchema#lock`. Instead, the concurrency control is delegated to `ConcurrentMap`s. This could potentially make querying the segments table faster because `getSegmentMetadataSnapshot()` no longer requires to lock `DruidSchema#lock`. Finally, this PR does not fix the expensive logic in `buildDruidTable()`. Incremental updates on `DruidTable` could be better but it requires that, for each column, we should be able to fall back to the column type in the second most recent segment when the most recent segment disappears. We can research more how we can track those column types of segments efficiently as a follow-up. <hr> ##### Key changed/added classes in this PR * `DruidSchema` <hr> This PR has: - [x] been self-reviewed. - [x] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.) - [ ] added documentation for new or modified features or behaviors. - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md) - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] added integration tests. - [ ] been tested in a test Druid cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
