gianm commented on code in PR #11550:
URL: https://github.com/apache/druid/pull/11550#discussion_r877552011
##########
sql/src/main/java/org/apache/druid/sql/calcite/schema/SystemSchema.java:
##########
@@ -313,7 +316,10 @@ public Enumerable<Object[]> scan(DataContext root)
(long) segment.getShardSpec().getPartitionNum(),
numReplicas,
numRows,
- IS_PUBLISHED_TRUE, //is_published is true for published
segments
+ //is_active is true for published segments that are not
overshadowed
+ val.isOvershadowed() ? IS_ACTIVE_FALSE : IS_ACTIVE_TRUE,
Review Comment:
Yes, that's the idea. The branch is for published segments only.
##########
docs/querying/sql-metadata-tables.md:
##########
@@ -127,20 +127,22 @@ Segments table provides details on all Druid segments,
whether they are publishe
|version|STRING|Version string (generally an ISO8601 timestamp corresponding
to when the segment set was first started). Higher version means the more
recently created segment. Version comparing is based on string comparison.|
|partition_num|LONG|Partition number (an integer, unique within a
datasource+interval+version; may not necessarily be contiguous)|
|num_replicas|LONG|Number of replicas of this segment currently being served|
-|num_rows|LONG|Number of rows in current segment, this value could be null if
unknown to Broker at query time|
-|is_published|LONG|Boolean is represented as long type where 1 = true, 0 =
false. 1 represents this segment has been published to the metadata store with
`used=1`. See the [Architecture
page](../design/architecture.md#segment-lifecycle) for more details.|
-|is_available|LONG|Boolean is represented as long type where 1 = true, 0 =
false. 1 if this segment is currently being served by any process(Historical or
realtime). See the [Architecture
page](../design/architecture.md#segment-lifecycle) for more details.|
-|is_realtime|LONG|Boolean is represented as long type where 1 = true, 0 =
false. 1 if this segment is _only_ served by realtime tasks, and 0 if any
historical process is serving this segment.|
-|is_overshadowed|LONG|Boolean is represented as long type where 1 = true, 0 =
false. 1 if this segment is published and is _fully_ overshadowed by some other
published segments. Currently, is_overshadowed is always false for unpublished
segments, although this may change in the future. You can filter for segments
that "should be published" by filtering for `is_published = 1 AND
is_overshadowed = 0`. Segments can briefly be both published and overshadowed
if they were recently replaced, but have not been unpublished yet. See the
[Architecture page](../design/architecture.md#segment-lifecycle) for more
details.|
-|shard_spec|STRING|JSON-serialized form of the segment `ShardSpec`|
+|num_rows|LONG|Number of rows in this segment, or zero if the number of rows
is not known.<br /><br />This row count is gathered by the Broker in the
background. It will be zero if the Broker has not gathered a row count for this
segment yet. For segments ingested from streams, the reported row count may lag
behind the result of a `count(*)` query because the cached `num_rows` on the
Broker may be out of date. This will settle shortly after new rows stop being
written to that particular segment.|
+|is_active|LONG|True for segments that represent the latest state of a
datasource.<br /><br />Equivalent to `(is_published = 1 AND is_overshadowed =
0) OR is_realtime = 1`. In steady state, when no ingestion or data management
operations are happening, `is_active` will be equivalent to `is_available`.
However, they may differ from each other when ingestion or data management
operations have executed recently. In these cases, Druid will load and unload
segments appropriately to bring actual availability in line with the expected
state given by `is_active`.|
Review Comment:
Yeah: there's a couple reasons a segment in `is_active` state won't
eventually become `is_available`. Maybe it's dropped before that happens. Or
maybe something is broken. In the interest of keeping the doc from getting too
long I'm thinking to leave it as-is. But I invite follow-up patches that
improve things 🙂
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]