[GitHub] [druid] jihoonson opened a new pull request #11457: Improve concurrency between DruidSchema and BrokerServerView

GitBox Fri, 16 Jul 2021 12:17:15 -0700


jihoonson opened a new pull request #11457:
URL: https://github.com/apache/druid/pull/11457

### Description

A concurrency issue is recently found between `DruidSchema` and
`BrokerServerView`, which is, refreshing a `DruidTable` in `DruidSchema` can
block query processing. This can happen in the scenario described below.

1. The `DruidSchema-Cache` thread locks `DruidSchema#lock` and calls
`DruidSchema.buildDruidTable()` to rebuild the rowSignature of druidTable.
2. A new segment is added to `BrokerServerView`. The `BrokerServerView`
thread will lock `BrokerServerView#lock`, process the new segment added, and
call timelineCallbacks of `DruidSchema`.
3. The timeline callback of `DruidSchema` is executed by the same thread
used in the step 2. This thread will wait for `DruidSchema.buildDruidTable()`
in the step 1 to be done and `DruidSchema#lock` is released.
4. A new query is issued and the query thread will call
`BrokerServerView.getTimeline()`. This call will wait for the timelineCallbacks
to be done in the step 2 and `BrokerServerView#lock` is released.

The following flame graphs show what those threads were doing when this
happened in our cluster. The metrics for these flame graphs were collected for
30 seconds.

![DruidSchema](https://user-images.githubusercontent.com/2322288/125991564-008eddf1-1bc8-435b-bc93-4b331589c0a8.png)

`DruidSchema` was calling `refreshSegmentsForDataSource()` and
`buildDruidTable()`. `refreshSegmentsForDataSource` issues a
`SegmentMetadataQuery` per segment and locks `DruidSchema#lock` per row in the
result to update segment metadata in memory. `buildDruidTable` locks
`DruidSchema#lock` _while it iterates all columns in all segments_ in the
datasource. When the datasource has 481200 segments and 774 columns in each
segment, `buildDruidTable` took about 25 seconds (!!) on my desktop.

![BrokerServerView](https://user-images.githubusercontent.com/2322288/125991550-eb61f746-56f2-4ca0-9494-d3ad20a929e4.png)

`BrokerServerView` was blocked in the `DruidSchema.addSegment` callback.

![2 timeseries
queries](https://user-images.githubusercontent.com/2322288/125991541-e3b7972e-397f-4e9c-a012-66a2d7b32285.png)

Finally, there were 2 timeseries queries that were blocked in
`BrokerServerView.getTimeline()` for 30 seconds.

Currently, this can happen whenever `DruidSchema` needs to refresh
`DruidTable`, which is whenever a new segment is added to the cluster or a
segment is completely removed from the cluster. Moving segments should not
cause this issue because a new segment is always loaded first in the new server
before it is removed from the previous server. As a result, moving segments
does not require to update the `RowSignature` of `DruidTable`.

To fix this issue, this PR improves the concurrency of `DruidSchema` by not
holding `DruidSchema#lock` to process expensive operations such as refreshing
`DruidTables`.

- A new thread, `DruidSchema-Callback`, is added in `DruidSchema` to
asynchronously process the timeline callbacks.
- The `segmentMetadataInfo` map is changed to `ConcurrentHashMap<String,
ConcurrentSkipListMap<SegmentId, AvailableSegmentMetadata>>`, so that updating
the map doesn't have to lock `DruidSchema#lock`. Instead, the concurrency
control is delegated to `ConcurrentMap`s. This could potentially make querying
the segments table faster because `getSegmentMetadataSnapshot()` no longer
requires to lock `DruidSchema#lock`.

Finally, this PR does not fix the expensive logic in `buildDruidTable()`.
Incremental updates on `DruidTable` could be better but it requires that, for
each column, we should be able to fall back to the column type in the second
most recent segment when the most recent segment disappears. We can research
more how we can track those column types of segments efficiently as a follow-up.

<hr>

##### Key changed/added classes in this PR
* `DruidSchema`

<hr>

This PR has:
- [x] been self-reviewed.
- [x] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
(Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [x] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [x] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [x] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [ ] been tested in a test Druid cluster.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] jihoonson opened a new pull request #11457: Improve concurrency between DruidSchema and BrokerServerView

Reply via email to