paul-rogers commented on a change in pull request #11550:
URL: https://github.com/apache/druid/pull/11550#discussion_r682987219
##########
File path: docs/querying/sql.md
##########
@@ -1123,20 +1123,23 @@ Segments table provides details on all Druid segments,
whether they are publishe
|version|STRING|Version string (generally an ISO8601 timestamp corresponding
to when the segment set was first started). Higher version means the more
recently created segment. Version comparing is based on string comparison.|
|partition_num|LONG|Partition number (an integer, unique within a
datasource+interval+version; may not necessarily be contiguous)|
|num_replicas|LONG|Number of replicas of this segment currently being served|
-|num_rows|LONG|Number of rows in current segment, this value could be null if
unknown to Broker at query time|
-|is_published|LONG|Boolean is represented as long type where 1 = true, 0 =
false. 1 represents this segment has been published to the metadata store with
`used=1`. See the [Architecture
page](../design/architecture.md#segment-lifecycle) for more details.|
-|is_available|LONG|Boolean is represented as long type where 1 = true, 0 =
false. 1 if this segment is currently being served by any process(Historical or
realtime). See the [Architecture
page](../design/architecture.md#segment-lifecycle) for more details.|
-|is_realtime|LONG|Boolean is represented as long type where 1 = true, 0 =
false. 1 if this segment is _only_ served by realtime tasks, and 0 if any
historical process is serving this segment.|
-|is_overshadowed|LONG|Boolean is represented as long type where 1 = true, 0 =
false. 1 if this segment is published and is _fully_ overshadowed by some other
published segments. Currently, is_overshadowed is always false for unpublished
segments, although this may change in the future. You can filter for segments
that "should be published" by filtering for `is_published = 1 AND
is_overshadowed = 0`. Segments can briefly be both published and overshadowed
if they were recently replaced, but have not been unpublished yet. See the
[Architecture page](../design/architecture.md#segment-lifecycle) for more
details.|
+|num_rows|LONG|Number of rows in this segment. This field is updated in the
background and cached on the Broker. It may be null if the Broker has not
gathered a row count for this segment yet. It may not match the result of
`count(*)` queries on realtime data, because the cached value on the Broker may
be out of date, and because different replicas of realtime segments may not be
in sync with each other. Once a segment is published, its row count will settle
and stop changing.|
+|is_active|LONG|Boolean represented as long type where 1 = true, 0 = false.
True for segments that are either available and queryable, or _should be_
available and querayble. Equivalent to `(is_published = 1 AND is_overshadowed =
0) OR is_realtime = 1`.|
Review comment:
This is the second (third) place in the docs that emphasizes *should*.
Is this notion explained anywhere? Does this mean that the segment is scheduled
to load into a Historical, but has not yet done so? Or, does it mean there is
some kind of problem that the user must resolve?
##########
File path: docs/querying/sql.md
##########
@@ -1123,20 +1123,23 @@ Segments table provides details on all Druid segments,
whether they are publishe
|version|STRING|Version string (generally an ISO8601 timestamp corresponding
to when the segment set was first started). Higher version means the more
recently created segment. Version comparing is based on string comparison.|
|partition_num|LONG|Partition number (an integer, unique within a
datasource+interval+version; may not necessarily be contiguous)|
|num_replicas|LONG|Number of replicas of this segment currently being served|
-|num_rows|LONG|Number of rows in current segment, this value could be null if
unknown to Broker at query time|
-|is_published|LONG|Boolean is represented as long type where 1 = true, 0 =
false. 1 represents this segment has been published to the metadata store with
`used=1`. See the [Architecture
page](../design/architecture.md#segment-lifecycle) for more details.|
-|is_available|LONG|Boolean is represented as long type where 1 = true, 0 =
false. 1 if this segment is currently being served by any process(Historical or
realtime). See the [Architecture
page](../design/architecture.md#segment-lifecycle) for more details.|
-|is_realtime|LONG|Boolean is represented as long type where 1 = true, 0 =
false. 1 if this segment is _only_ served by realtime tasks, and 0 if any
historical process is serving this segment.|
-|is_overshadowed|LONG|Boolean is represented as long type where 1 = true, 0 =
false. 1 if this segment is published and is _fully_ overshadowed by some other
published segments. Currently, is_overshadowed is always false for unpublished
segments, although this may change in the future. You can filter for segments
that "should be published" by filtering for `is_published = 1 AND
is_overshadowed = 0`. Segments can briefly be both published and overshadowed
if they were recently replaced, but have not been unpublished yet. See the
[Architecture page](../design/architecture.md#segment-lifecycle) for more
details.|
+|num_rows|LONG|Number of rows in this segment. This field is updated in the
background and cached on the Broker. It may be null if the Broker has not
gathered a row count for this segment yet. It may not match the result of
`count(*)` queries on realtime data, because the cached value on the Broker may
be out of date, and because different replicas of realtime segments may not be
in sync with each other. Once a segment is published, its row count will settle
and stop changing.|
+|is_active|LONG|Boolean represented as long type where 1 = true, 0 = false.
True for segments that are either available and queryable, or _should be_
available and querayble. Equivalent to `(is_published = 1 AND is_overshadowed =
0) OR is_realtime = 1`.|
+|is_published|LONG|Boolean represented as long type where 1 = true, 0 = false.
1 represents this segment has been published to the metadata store with
`used=1`. See the [segment lifecycle
documentation](../design/architecture.md#segment-lifecycle) for more details.|
Review comment:
Presumably "published to the metadata store" means "by the MiddleManager
at the completion of ingestion"?
##########
File path: docs/querying/sql.md
##########
@@ -1123,20 +1123,23 @@ Segments table provides details on all Druid segments,
whether they are publishe
|version|STRING|Version string (generally an ISO8601 timestamp corresponding
to when the segment set was first started). Higher version means the more
recently created segment. Version comparing is based on string comparison.|
|partition_num|LONG|Partition number (an integer, unique within a
datasource+interval+version; may not necessarily be contiguous)|
|num_replicas|LONG|Number of replicas of this segment currently being served|
-|num_rows|LONG|Number of rows in current segment, this value could be null if
unknown to Broker at query time|
-|is_published|LONG|Boolean is represented as long type where 1 = true, 0 =
false. 1 represents this segment has been published to the metadata store with
`used=1`. See the [Architecture
page](../design/architecture.md#segment-lifecycle) for more details.|
-|is_available|LONG|Boolean is represented as long type where 1 = true, 0 =
false. 1 if this segment is currently being served by any process(Historical or
realtime). See the [Architecture
page](../design/architecture.md#segment-lifecycle) for more details.|
-|is_realtime|LONG|Boolean is represented as long type where 1 = true, 0 =
false. 1 if this segment is _only_ served by realtime tasks, and 0 if any
historical process is serving this segment.|
-|is_overshadowed|LONG|Boolean is represented as long type where 1 = true, 0 =
false. 1 if this segment is published and is _fully_ overshadowed by some other
published segments. Currently, is_overshadowed is always false for unpublished
segments, although this may change in the future. You can filter for segments
that "should be published" by filtering for `is_published = 1 AND
is_overshadowed = 0`. Segments can briefly be both published and overshadowed
if they were recently replaced, but have not been unpublished yet. See the
[Architecture page](../design/architecture.md#segment-lifecycle) for more
details.|
+|num_rows|LONG|Number of rows in this segment. This field is updated in the
background and cached on the Broker. It may be null if the Broker has not
gathered a row count for this segment yet. It may not match the result of
`count(*)` queries on realtime data, because the cached value on the Broker may
be out of date, and because different replicas of realtime segments may not be
in sync with each other. Once a segment is published, its row count will settle
and stop changing.|
Review comment:
Change the wording a bit? Seems the key bit for a user to know is: For a
published segment, the number will either be null or accurate. If null, then
the Broker has not received the row count yet. For an unpublished segment, the
number will be slightly out of date as new data arrives. (Assuming this is an
accurate statement.)
##########
File path: docs/querying/sql.md
##########
@@ -1123,20 +1123,23 @@ Segments table provides details on all Druid segments,
whether they are publishe
|version|STRING|Version string (generally an ISO8601 timestamp corresponding
to when the segment set was first started). Higher version means the more
recently created segment. Version comparing is based on string comparison.|
|partition_num|LONG|Partition number (an integer, unique within a
datasource+interval+version; may not necessarily be contiguous)|
|num_replicas|LONG|Number of replicas of this segment currently being served|
-|num_rows|LONG|Number of rows in current segment, this value could be null if
unknown to Broker at query time|
-|is_published|LONG|Boolean is represented as long type where 1 = true, 0 =
false. 1 represents this segment has been published to the metadata store with
`used=1`. See the [Architecture
page](../design/architecture.md#segment-lifecycle) for more details.|
-|is_available|LONG|Boolean is represented as long type where 1 = true, 0 =
false. 1 if this segment is currently being served by any process(Historical or
realtime). See the [Architecture
page](../design/architecture.md#segment-lifecycle) for more details.|
-|is_realtime|LONG|Boolean is represented as long type where 1 = true, 0 =
false. 1 if this segment is _only_ served by realtime tasks, and 0 if any
historical process is serving this segment.|
-|is_overshadowed|LONG|Boolean is represented as long type where 1 = true, 0 =
false. 1 if this segment is published and is _fully_ overshadowed by some other
published segments. Currently, is_overshadowed is always false for unpublished
segments, although this may change in the future. You can filter for segments
that "should be published" by filtering for `is_published = 1 AND
is_overshadowed = 0`. Segments can briefly be both published and overshadowed
if they were recently replaced, but have not been unpublished yet. See the
[Architecture page](../design/architecture.md#segment-lifecycle) for more
details.|
+|num_rows|LONG|Number of rows in this segment. This field is updated in the
background and cached on the Broker. It may be null if the Broker has not
gathered a row count for this segment yet. It may not match the result of
`count(*)` queries on realtime data, because the cached value on the Broker may
be out of date, and because different replicas of realtime segments may not be
in sync with each other. Once a segment is published, its row count will settle
and stop changing.|
+|is_active|LONG|Boolean represented as long type where 1 = true, 0 = false.
True for segments that are either available and queryable, or _should be_
available and querayble. Equivalent to `(is_published = 1 AND is_overshadowed =
0) OR is_realtime = 1`.|
+|is_published|LONG|Boolean represented as long type where 1 = true, 0 = false.
1 represents this segment has been published to the metadata store with
`used=1`. See the [segment lifecycle
documentation](../design/architecture.md#segment-lifecycle) for more details.|
+|is_available|LONG|Boolean represented as long type where 1 = true, 0 = false.
1 if this segment is currently being served by any process(Historical or
realtime). See the [segment lifecycle
documentation](../design/architecture.md#segment-lifecycle) for more details.|
+|is_realtime|LONG|Boolean represented as long type where 1 = true, 0 = false.
1 if this segment is _only_ served by realtime tasks, and 0 if any historical
process is serving this segment.|
+|is_overshadowed|LONG|Boolean represented as long type where 1 = true, 0 =
false. 1 if this segment is published and is _fully_ overshadowed by some other
published segments. Currently, is_overshadowed is always 0 for unpublished
segments, although this may change in the future. You can filter for segments
that "should be published" by filtering for `is_published = 1 AND
is_overshadowed = 0`. Segments can briefly be both published and overshadowed
if they were recently replaced, but have not been unpublished yet. See the
[segment lifecycle documentation](../design/architecture.md#segment-lifecycle)
for more details.|
Review comment:
Nit: consistent use of code font: `is_overshadowed`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]