gianm commented on a change in pull request #11550:
URL: https://github.com/apache/druid/pull/11550#discussion_r735042259



##########
File path: docs/querying/sql.md
##########
@@ -1123,20 +1123,23 @@ Segments table provides details on all Druid segments, 
whether they are publishe
 |version|STRING|Version string (generally an ISO8601 timestamp corresponding 
to when the segment set was first started). Higher version means the more 
recently created segment. Version comparing is based on string comparison.|
 |partition_num|LONG|Partition number (an integer, unique within a 
datasource+interval+version; may not necessarily be contiguous)|
 |num_replicas|LONG|Number of replicas of this segment currently being served|
-|num_rows|LONG|Number of rows in current segment, this value could be null if 
unknown to Broker at query time|
-|is_published|LONG|Boolean is represented as long type where 1 = true, 0 = 
false. 1 represents this segment has been published to the metadata store with 
`used=1`. See the [Architecture 
page](../design/architecture.md#segment-lifecycle) for more details.|
-|is_available|LONG|Boolean is represented as long type where 1 = true, 0 = 
false. 1 if this segment is currently being served by any process(Historical or 
realtime). See the [Architecture 
page](../design/architecture.md#segment-lifecycle) for more details.|
-|is_realtime|LONG|Boolean is represented as long type where 1 = true, 0 = 
false. 1 if this segment is _only_ served by realtime tasks, and 0 if any 
historical process is serving this segment.|
-|is_overshadowed|LONG|Boolean is represented as long type where 1 = true, 0 = 
false. 1 if this segment is published and is _fully_ overshadowed by some other 
published segments. Currently, is_overshadowed is always false for unpublished 
segments, although this may change in the future. You can filter for segments 
that "should be published" by filtering for `is_published = 1 AND 
is_overshadowed = 0`. Segments can briefly be both published and overshadowed 
if they were recently replaced, but have not been unpublished yet. See the 
[Architecture page](../design/architecture.md#segment-lifecycle) for more 
details.|
+|num_rows|LONG|Number of rows in this segment. This field is updated in the 
background and cached on the Broker. It may be null if the Broker has not 
gathered a row count for this segment yet. It may not match the result of 
`count(*)` queries on realtime data, because the cached value on the Broker may 
be out of date, and because different replicas of realtime segments may not be 
in sync with each other. Once a segment is published, its row count will settle 
and stop changing.|

Review comment:
       There's a little bit of delay between when a segment is published and 
when num_rows becomes fully accurate, because it's fetched via doing a query to 
a data server, rather than appearing in the published segment descriptor. I 
updated the wording to the following, which is hopefully more clear:
   
   > Number of rows in this segment, or zero if the number of rows is not known.
   >
   > This row count is gathered by the Broker in the background. It will be 
zero if the Broker has not gathered a row count for this segment yet. For 
segments ingested from streams, the reported row count may lag behind the 
result of a `count(*)` query because the cached `num_rows` on the Broker may be 
out of date. This will settle shortly after new rows stop being written to that 
particular segment.
   
   (I also changed "null" to "zero" because that's what it actually is.)

##########
File path: docs/querying/sql.md
##########
@@ -1123,20 +1123,23 @@ Segments table provides details on all Druid segments, 
whether they are publishe
 |version|STRING|Version string (generally an ISO8601 timestamp corresponding 
to when the segment set was first started). Higher version means the more 
recently created segment. Version comparing is based on string comparison.|
 |partition_num|LONG|Partition number (an integer, unique within a 
datasource+interval+version; may not necessarily be contiguous)|
 |num_replicas|LONG|Number of replicas of this segment currently being served|
-|num_rows|LONG|Number of rows in current segment, this value could be null if 
unknown to Broker at query time|
-|is_published|LONG|Boolean is represented as long type where 1 = true, 0 = 
false. 1 represents this segment has been published to the metadata store with 
`used=1`. See the [Architecture 
page](../design/architecture.md#segment-lifecycle) for more details.|
-|is_available|LONG|Boolean is represented as long type where 1 = true, 0 = 
false. 1 if this segment is currently being served by any process(Historical or 
realtime). See the [Architecture 
page](../design/architecture.md#segment-lifecycle) for more details.|
-|is_realtime|LONG|Boolean is represented as long type where 1 = true, 0 = 
false. 1 if this segment is _only_ served by realtime tasks, and 0 if any 
historical process is serving this segment.|
-|is_overshadowed|LONG|Boolean is represented as long type where 1 = true, 0 = 
false. 1 if this segment is published and is _fully_ overshadowed by some other 
published segments. Currently, is_overshadowed is always false for unpublished 
segments, although this may change in the future. You can filter for segments 
that "should be published" by filtering for `is_published = 1 AND 
is_overshadowed = 0`. Segments can briefly be both published and overshadowed 
if they were recently replaced, but have not been unpublished yet. See the 
[Architecture page](../design/architecture.md#segment-lifecycle) for more 
details.|
+|num_rows|LONG|Number of rows in this segment. This field is updated in the 
background and cached on the Broker. It may be null if the Broker has not 
gathered a row count for this segment yet. It may not match the result of 
`count(*)` queries on realtime data, because the cached value on the Broker may 
be out of date, and because different replicas of realtime segments may not be 
in sync with each other. Once a segment is published, its row count will settle 
and stop changing.|
+|is_active|LONG|Boolean represented as long type where 1 = true, 0 = false. 
True for segments that are either available and queryable, or _should be_ 
available and querayble. Equivalent to `(is_published = 1 AND is_overshadowed = 
0) OR is_realtime = 1`.|

Review comment:
       The context with the "should be" is that everything with regard to 
ingestion and segment availability happens in the background and is 
asynchronous. So some segments maybe should be available, but aren't right now, 
and the system will work to make them available. Some others maybe are 
available, but shouldn't be (because they were dropped or replaced), and the 
system will work to make them unavailable.
   
   I changed the wording to hopefully be more clear:
   
   > True for segments that represent the latest state of a datasource.
   >
   > Equivalent to `(is_published = 1 AND is_overshadowed = 0) OR is_realtime = 
1`. In steady state, when no ingestions or data management operations are 
happening, `is_active` will be equivalent to `is_available`. However, they may 
differ from each other when ingestions or data management operations have 
executed recently. In these cases, Druid will load and unload segments 
appropriately to bring actual availability in line with the expected state 
given by `is_active`.

##########
File path: docs/querying/sql.md
##########
@@ -1123,20 +1123,23 @@ Segments table provides details on all Druid segments, 
whether they are publishe
 |version|STRING|Version string (generally an ISO8601 timestamp corresponding 
to when the segment set was first started). Higher version means the more 
recently created segment. Version comparing is based on string comparison.|
 |partition_num|LONG|Partition number (an integer, unique within a 
datasource+interval+version; may not necessarily be contiguous)|
 |num_replicas|LONG|Number of replicas of this segment currently being served|
-|num_rows|LONG|Number of rows in current segment, this value could be null if 
unknown to Broker at query time|
-|is_published|LONG|Boolean is represented as long type where 1 = true, 0 = 
false. 1 represents this segment has been published to the metadata store with 
`used=1`. See the [Architecture 
page](../design/architecture.md#segment-lifecycle) for more details.|
-|is_available|LONG|Boolean is represented as long type where 1 = true, 0 = 
false. 1 if this segment is currently being served by any process(Historical or 
realtime). See the [Architecture 
page](../design/architecture.md#segment-lifecycle) for more details.|
-|is_realtime|LONG|Boolean is represented as long type where 1 = true, 0 = 
false. 1 if this segment is _only_ served by realtime tasks, and 0 if any 
historical process is serving this segment.|
-|is_overshadowed|LONG|Boolean is represented as long type where 1 = true, 0 = 
false. 1 if this segment is published and is _fully_ overshadowed by some other 
published segments. Currently, is_overshadowed is always false for unpublished 
segments, although this may change in the future. You can filter for segments 
that "should be published" by filtering for `is_published = 1 AND 
is_overshadowed = 0`. Segments can briefly be both published and overshadowed 
if they were recently replaced, but have not been unpublished yet. See the 
[Architecture page](../design/architecture.md#segment-lifecycle) for more 
details.|
+|num_rows|LONG|Number of rows in this segment. This field is updated in the 
background and cached on the Broker. It may be null if the Broker has not 
gathered a row count for this segment yet. It may not match the result of 
`count(*)` queries on realtime data, because the cached value on the Broker may 
be out of date, and because different replicas of realtime segments may not be 
in sync with each other. Once a segment is published, its row count will settle 
and stop changing.|
+|is_active|LONG|Boolean represented as long type where 1 = true, 0 = false. 
True for segments that are either available and queryable, or _should be_ 
available and querayble. Equivalent to `(is_published = 1 AND is_overshadowed = 
0) OR is_realtime = 1`.|
+|is_published|LONG|Boolean represented as long type where 1 = true, 0 = false. 
1 represents this segment has been published to the metadata store with 
`used=1`. See the [segment lifecycle 
documentation](../design/architecture.md#segment-lifecycle) for more details.|

Review comment:
       Yes.

##########
File path: docs/querying/sql.md
##########
@@ -1123,20 +1123,23 @@ Segments table provides details on all Druid segments, 
whether they are publishe
 |version|STRING|Version string (generally an ISO8601 timestamp corresponding 
to when the segment set was first started). Higher version means the more 
recently created segment. Version comparing is based on string comparison.|
 |partition_num|LONG|Partition number (an integer, unique within a 
datasource+interval+version; may not necessarily be contiguous)|
 |num_replicas|LONG|Number of replicas of this segment currently being served|
-|num_rows|LONG|Number of rows in current segment, this value could be null if 
unknown to Broker at query time|
-|is_published|LONG|Boolean is represented as long type where 1 = true, 0 = 
false. 1 represents this segment has been published to the metadata store with 
`used=1`. See the [Architecture 
page](../design/architecture.md#segment-lifecycle) for more details.|
-|is_available|LONG|Boolean is represented as long type where 1 = true, 0 = 
false. 1 if this segment is currently being served by any process(Historical or 
realtime). See the [Architecture 
page](../design/architecture.md#segment-lifecycle) for more details.|
-|is_realtime|LONG|Boolean is represented as long type where 1 = true, 0 = 
false. 1 if this segment is _only_ served by realtime tasks, and 0 if any 
historical process is serving this segment.|
-|is_overshadowed|LONG|Boolean is represented as long type where 1 = true, 0 = 
false. 1 if this segment is published and is _fully_ overshadowed by some other 
published segments. Currently, is_overshadowed is always false for unpublished 
segments, although this may change in the future. You can filter for segments 
that "should be published" by filtering for `is_published = 1 AND 
is_overshadowed = 0`. Segments can briefly be both published and overshadowed 
if they were recently replaced, but have not been unpublished yet. See the 
[Architecture page](../design/architecture.md#segment-lifecycle) for more 
details.|
+|num_rows|LONG|Number of rows in this segment. This field is updated in the 
background and cached on the Broker. It may be null if the Broker has not 
gathered a row count for this segment yet. It may not match the result of 
`count(*)` queries on realtime data, because the cached value on the Broker may 
be out of date, and because different replicas of realtime segments may not be 
in sync with each other. Once a segment is published, its row count will settle 
and stop changing.|
+|is_active|LONG|Boolean represented as long type where 1 = true, 0 = false. 
True for segments that are either available and queryable, or _should be_ 
available and querayble. Equivalent to `(is_published = 1 AND is_overshadowed = 
0) OR is_realtime = 1`.|
+|is_published|LONG|Boolean represented as long type where 1 = true, 0 = false. 
1 represents this segment has been published to the metadata store with 
`used=1`. See the [segment lifecycle 
documentation](../design/architecture.md#segment-lifecycle) for more details.|
+|is_available|LONG|Boolean represented as long type where 1 = true, 0 = false. 
1 if this segment is currently being served by any process(Historical or 
realtime). See the [segment lifecycle 
documentation](../design/architecture.md#segment-lifecycle) for more details.|
+|is_realtime|LONG|Boolean represented as long type where 1 = true, 0 = false. 
1 if this segment is _only_ served by realtime tasks, and 0 if any historical 
process is serving this segment.|
+|is_overshadowed|LONG|Boolean represented as long type where 1 = true, 0 = 
false. 1 if this segment is published and is _fully_ overshadowed by some other 
published segments. Currently, is_overshadowed is always 0 for unpublished 
segments, although this may change in the future. You can filter for segments 
that "should be published" by filtering for `is_published = 1 AND 
is_overshadowed = 0`. Segments can briefly be both published and overshadowed 
if they were recently replaced, but have not been unpublished yet. See the 
[segment lifecycle documentation](../design/architecture.md#segment-lifecycle) 
for more details.|

Review comment:
       Thanks, fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to