findingrish opened a new pull request, #13967:
URL: https://github.com/apache/druid/pull/13967
#### Description
Broker maintains a timeline of segments which it builds overtime upon
receiving updates from historical server and it uses this timeline to answer
queries. Broker isn’t aware of what segments actually exists in the druid
system. The result of this gap is incomplete query responses on some occasions.
With this feature the goal is to ensure, if a segment was queryable at one
point in time, any future query over that segment would either include that
segment or fail.
#### Design
Broker polls the coordinator periodically to get all the used segments in
the system, it merges all the segments that have once been loaded by a
historical server into its timeline of segments. This timeline now consists of
segments which are available on some historical server and which aren’t
available on any server, this information helps the broker identify unavailable
segments for the query.
This approach also ensures that any segment which has just been published
but not loaded by any historical server doesn’t cause query failure.
Following synchronisation conditions could cause temporary query failure,
- If the broker isn’t able to sync its timeline with the coordinator, this
would cause broker to be unaware of recently removed segments from the
historical
- If the broker is behind historical server, sync with coordinator makes it
aware of recently loaded segments but the broker would think that they are
unavailable
#### Major changes
##### Coordinator changes
- Add a new column `handed_off` & `handed_off_time` in the druid_segments
metadata table to represent if a segment has ever been loaded on a historical
and the first load time(changes in `SQLMetadataConnector`)
- When the coordinator is notified that a segment has been loaded, set the
`handed_off` column to true
- Update `DataSourcesSnapshot` to maintain diff of the segments from the
previous poll.
- Add coordinator API `MetadataResource#getChangedSegments` to send either
full snapshot or delta changes to the broker using the information present in
`DataSourcesSnapshot`
- Changed classes: `CoordinatorServerView`, `SqlSegmentsMetadataManager`,
`SqlSegmentsMetadataQuery`, `MetadataResource`, `DataSourcesSnapshot`
##### Broker changes
- `MetadataSegmentView` polls the coordinator to fetch the list of all used
segments along with their overshadowed and handedOff status, on the very first
poll it receives a full snapshot thereafter it receives delta updates.
- After the finish of every poll, notify BrokerServerView to update its
timeline with all segments that have been handed off
- Remove segments that are not used anymore i.e. segments that are not
present in the list polled from the coordinator
- Add segments that are used and handedOff to the timeline, if they don’t
already exist
- While handling a query on the broker, lookup the segments required for the
query from the timeline. If any of these segments is unavailable, throw an
error.
- Changed classes: `CachingClusteredClient`, `BrokerServerView`,
`MetadataSegmentView`
#### Upgrade considerations
#### Usage
- `druid.sql.planner.detectUnavailbleSegments` needs to be set in broker
runtime properties
- `unavailableSegmentsAction` query context can be set to `allow` or `fail`,
accordingly the queries would fail, in either case the unavailable segments
will be logged.
#### Release note
<hr>
This PR has:
- [x] been self-reviewed.
- [ ] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
(Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [ ] a release note entry in the PR description.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [x] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [ ] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]