surekhasaharan commented on a change in pull request #7034: add note on consistency of results for sys.segments queries URL: https://github.com/apache/incubator-druid/pull/7034#discussion_r256696566
########## File path: docs/content/querying/sql.md ########## @@ -571,6 +571,8 @@ The "sys" schema provides visibility into Druid segments, servers and tasks. ### SEGMENTS table Segments table provides details on all Druid segments, whether they are published yet or not. +#### CAVEAT +Note that a segment can be served by more than one stream ingestion tasks or Historical processes, in that case it would have multiple replicas. These replicas are weakly consistent with each other when served by multiple ingestion tasks, until a segment is eventually served by a Historical, at that point the segment is immutable. Broker prefers to query a segment from Historical over a ingestion task. But if a segment has multiple realtime replicas, for eg. kafka index tasks, and one task is slower than other, then the sys.segments query results can vary for the duration of the tasks because only one of the ingestion tasks is queried by the Broker and it is not gauranteed that the same task gets picked everytime. The columns of segments table that can have inconsistent values during this period include `num_replicas` and `num_rows`. There is an open [issue](https://github.com/apache/incubator-druid/issues/5915) about this inconsistency with stream ingestion tasks. Review comment: changed ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
