findingrish opened a new pull request, #13967:
URL: https://github.com/apache/druid/pull/13967

   #### Description
   Broker maintains a timeline of segments which it builds overtime upon 
receiving updates from historical server and it uses this timeline to answer 
queries. Broker isn’t aware of what segments actually exists in the druid 
system. The result of this gap is incomplete query responses on some occasions. 
   
   With this feature the goal is to ensure, if a segment was queryable at one 
point in time, any future query over that segment would either include that 
segment or fail.  
   
   #### Design 
   Broker polls the coordinator periodically to get all the used segments in 
the system, it merges all the segments that have once been loaded by a 
historical server into its timeline of segments. This timeline now consists of 
segments which are available on some historical server and which aren’t 
available on any server, this information helps the broker identify unavailable 
segments for the query. 
   
   This approach also ensures that any segment which has just been published 
but not loaded by any historical server doesn’t cause query failure. 
   
   Following synchronisation conditions could cause temporary query failure,
   - If the broker isn’t able to sync its timeline with the coordinator, this 
would cause broker to be unaware of recently removed segments from the 
historical 
   
   - If the broker is behind historical server, sync with coordinator makes it 
aware of recently loaded segments but the broker would think that they are 
unavailable 
   
   #### Major changes
   
   ##### Coordinator changes
   - Add a new column `handed_off` & `handed_off_time` in the druid_segments 
metadata table to represent if a segment has ever been loaded on a historical 
and the first load time(changes in `SQLMetadataConnector`)
   
   - When the coordinator is notified that a segment has been loaded, set the 
`handed_off` column to true
   
   - Update `DataSourcesSnapshot` to maintain diff of the segments from the 
previous poll.  
   
   - Add coordinator API  `MetadataResource#getChangedSegments` to send either 
full snapshot or delta changes to the broker using the information present in 
`DataSourcesSnapshot`
   
   - Changed classes: `CoordinatorServerView`, `SqlSegmentsMetadataManager`, 
`SqlSegmentsMetadataQuery`, `MetadataResource`, `DataSourcesSnapshot`
   
   ##### Broker changes
   - `MetadataSegmentView` polls the coordinator to fetch the list of all used 
segments along with their overshadowed and handedOff status, on the very first 
poll it receives a full snapshot thereafter it receives delta updates. 
   
   - After the finish of every poll, notify BrokerServerView to update its 
timeline with all segments that have been handed off
      - Remove segments that are not used anymore i.e. segments that are not 
present in the list polled from the coordinator
      - Add segments that are used and handedOff to the timeline, if they don’t 
already exist
   
   - While handling a query on the broker, lookup the segments required for the 
query from the timeline. If any of these segments is unavailable, throw an 
error.
   
   - Changed classes: `CachingClusteredClient`, `BrokerServerView`, 
`MetadataSegmentView`
   
   #### Upgrade considerations 
   
   #### Usage 
   - `druid.sql.planner.detectUnavailbleSegments` needs to be set in broker 
runtime properties 
   - `unavailableSegmentsAction` query context can be set to `allow` or `fail`, 
accordingly the queries would fail, in either case the unavailable segments 
will be logged. 
   
   #### Release note
   
   <hr>
   
   This PR has:
   
   - [x] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [x] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to