surekhasaharan commented on a change in pull request #7034: add note on 
consistency of results for sys.segments queries
URL: https://github.com/apache/incubator-druid/pull/7034#discussion_r256641065
 
 

 ##########
 File path: docs/content/querying/sql.md
 ##########
 @@ -571,6 +571,8 @@ The "sys" schema provides visibility into Druid segments, 
servers and tasks.
 ### SEGMENTS table
 Segments table provides details on all Druid segments, whether they are 
published yet or not.
 
+#### CAVEAT
+Note that a segment can be served by more than one realtime or historical 
servers, in that case it would have multiple replicas. These replicas are 
weakly consistent with each other when served by multiple realtime tasks, until 
a segment is eventually served by a historical, at that point the segment is 
immutable. And broker prefers to query a segment from historical over realtime 
server. But if a segment has multiple realtime replicas, for eg. kafka index 
tasks, and one task is slower than other, then the sys.segments query results 
can vary for the duration of the tasks. The columns of segments table that can 
have inconsistent values during this period include `size`, `num_replicas`, 
`num_rows`.
 
 Review comment:
   >Would you explain why `size` and `num_replica` vary? It looks that they are 
not getting from segmentMetadataQuery.
   
   I think `size` would not vary between ingestion tasks, since they all would 
show 0, but it can vary if a segment is queried from Historical vs realtime 
task. But given that, Broker prefers Historical, may be `size` is not an issue. 
For `num_replica`, it can change if a segment gets added or removed from 
`TimelineServerView.TimelineCallback` in `DruidSchema`, and it's value can vary 
between the queries.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to