This is an automated email from the ASF dual-hosted git repository.

gian pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git


The following commit(s) were added to refs/heads/master by this push:
     new 65a1375b67 SQL: Add is_active to sys.segments, update examples and 
docs. (#11550)
65a1375b67 is described below

commit 65a1375b67dfa470d0eeefd078b4252e4427d74e
Author: Gian Merlino <[email protected]>
AuthorDate: Thu May 19 14:23:28 2022 -0700

    SQL: Add is_active to sys.segments, update examples and docs. (#11550)
    
    * SQL: Add is_active to sys.segments, update examples and docs.
    
    is_active is short for:
    
      (is_published = 1 AND is_overshadowed = 0) OR is_realtime = 1
    
    It's important because this represents "all the segments that should
    be queryable, whether or not they actually are right now". Most of the
    time, this is the set of segments that people will want to look at.
    
    The web console already adds this filter to a lot of its queries,
    proving its usefulness.
    
    This patch also reworks the caveat at the bottom of the sys.segments
    section, so its information is mixed into the description of each result
    field. This should make it more likely for people to see the information.
    
    * Wording updates.
    
    * Adjustments for spellcheck.
    
    * Adjust IT.
---
 docs/querying/sql-metadata-tables.md               | 25 +++++++++++-----------
 .../results/auth_test_sys_schema_segments.json     |  1 +
 .../druid/sql/calcite/schema/SystemSchema.java     | 12 +++++++++--
 .../druid/sql/calcite/schema/SystemSchemaTest.java | 15 +++++++------
 website/.spelling                                  |  1 +
 5 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/docs/querying/sql-metadata-tables.md 
b/docs/querying/sql-metadata-tables.md
index 2eb96aee34..b48b795e38 100644
--- a/docs/querying/sql-metadata-tables.md
+++ b/docs/querying/sql-metadata-tables.md
@@ -127,20 +127,22 @@ Segments table provides details on all Druid segments, 
whether they are publishe
 |version|STRING|Version string (generally an ISO8601 timestamp corresponding 
to when the segment set was first started). Higher version means the more 
recently created segment. Version comparing is based on string comparison.|
 |partition_num|LONG|Partition number (an integer, unique within a 
datasource+interval+version; may not necessarily be contiguous)|
 |num_replicas|LONG|Number of replicas of this segment currently being served|
-|num_rows|LONG|Number of rows in current segment, this value could be null if 
unknown to Broker at query time|
-|is_published|LONG|Boolean is represented as long type where 1 = true, 0 = 
false. 1 represents this segment has been published to the metadata store with 
`used=1`. See the [Architecture 
page](../design/architecture.md#segment-lifecycle) for more details.|
-|is_available|LONG|Boolean is represented as long type where 1 = true, 0 = 
false. 1 if this segment is currently being served by any process(Historical or 
realtime). See the [Architecture 
page](../design/architecture.md#segment-lifecycle) for more details.|
-|is_realtime|LONG|Boolean is represented as long type where 1 = true, 0 = 
false. 1 if this segment is _only_ served by realtime tasks, and 0 if any 
historical process is serving this segment.|
-|is_overshadowed|LONG|Boolean is represented as long type where 1 = true, 0 = 
false. 1 if this segment is published and is _fully_ overshadowed by some other 
published segments. Currently, is_overshadowed is always false for unpublished 
segments, although this may change in the future. You can filter for segments 
that "should be published" by filtering for `is_published = 1 AND 
is_overshadowed = 0`. Segments can briefly be both published and overshadowed 
if they were recently replaced, b [...]
-|shard_spec|STRING|JSON-serialized form of the segment `ShardSpec`|
+|num_rows|LONG|Number of rows in this segment, or zero if the number of rows 
is not known.<br /><br />This row count is gathered by the Broker in the 
background. It will be zero if the Broker has not gathered a row count for this 
segment yet. For segments ingested from streams, the reported row count may lag 
behind the result of a `count(*)` query because the cached `num_rows` on the 
Broker may be out of date. This will settle shortly after new rows stop being 
written to that particular  [...]
+|is_active|LONG|True for segments that represent the latest state of a 
datasource.<br /><br />Equivalent to `(is_published = 1 AND is_overshadowed = 
0) OR is_realtime = 1`. In steady state, when no ingestion or data management 
operations are happening, `is_active` will be equivalent to `is_available`. 
However, they may differ from each other when ingestion or data management 
operations have executed recently. In these cases, Druid will load and unload 
segments appropriately to bring actu [...]
+|is_published|LONG|Boolean represented as long type where 1 = true, 0 = false. 
1 if this segment has been published to the metadata store and is marked as 
used. See the [segment lifecycle 
documentation](../design/architecture.md#segment-lifecycle) for more details.|
+|is_available|LONG|Boolean represented as long type where 1 = true, 0 = false. 
1 if this segment is currently being served by any data serving process, like a 
Historical or a realtime ingestion task. See the [segment lifecycle 
documentation](../design/architecture.md#segment-lifecycle) for more details.|
+|is_realtime|LONG|Boolean represented as long type where 1 = true, 0 = false. 
1 if this segment is _only_ served by realtime tasks, and 0 if any Historical 
process is serving this segment.|
+|is_overshadowed|LONG|Boolean represented as long type where 1 = true, 0 = 
false. 1 if this segment is published and is _fully_ overshadowed by some other 
published segments. Currently, `is_overshadowed` is always 0 for unpublished 
segments, although this may change in the future. You can filter for segments 
that "should be published" by filtering for `is_published = 1 AND 
is_overshadowed = 0`. Segments can briefly be both published and overshadowed 
if they were recently replaced, but ha [...]
 |dimensions|STRING|JSON-serialized form of the segment dimensions|
 |metrics|STRING|JSON-serialized form of the segment metrics|
 |last_compaction_state|STRING|JSON-serialized form of the compaction task's 
config (compaction task which created this segment). May be null if segment was 
not created by compaction task.|
 
-For example to retrieve all segments for datasource "wikipedia", use the query:
+For example, to retrieve all currently active segments for datasource 
"wikipedia", use the query:
 
 ```sql
-SELECT * FROM sys.segments WHERE datasource = 'wikipedia'
+SELECT * FROM sys.segments
+WHERE datasource = 'wikipedia'
+AND is_active = 1
 ```
 
 Another example to retrieve segments total_size, avg_size, avg_num_rows and 
num_segments per datasource:
@@ -153,6 +155,7 @@ SELECT
     CASE WHEN SUM(num_rows) = 0 THEN 0 ELSE SUM("num_rows") / (COUNT(*) 
FILTER(WHERE num_rows > 0)) END AS avg_num_rows,
     COUNT(*) AS num_segments
 FROM sys.segments
+WHERE is_active = 1
 GROUP BY 1
 ORDER BY 2 DESC
 ```
@@ -180,17 +183,15 @@ ORDER BY 1
 If you want to retrieve segment that was compacted (ANY compaction):
 
 ```sql
-SELECT * FROM sys.segments WHERE last_compaction_state is not null
+SELECT * FROM sys.segments WHERE is_active = 1 AND last_compaction_state IS 
NOT NULL
 ```
 
 or if you want to retrieve segment that was compacted only by a particular 
compaction spec (such as that of the auto compaction):
 
 ```sql
-SELECT * FROM sys.segments WHERE last_compaction_state == 'SELECT * FROM 
sys.segments where last_compaction_state = 
'CompactionState{partitionsSpec=DynamicPartitionsSpec{maxRowsPerSegment=5000000,
 maxTotalRows=9223372036854775807}, indexSpec={bitmap={type=roaring, 
compressRunOnSerialization=true}, dimensionCompression=lz4, 
metricCompression=lz4, longEncoding=longs, segmentLoader=null}}'
+SELECT * FROM sys.segments WHERE is_active = 1 AND last_compaction_state = 
'CompactionState{partitionsSpec=DynamicPartitionsSpec{maxRowsPerSegment=5000000,
 maxTotalRows=9223372036854775807}, indexSpec={bitmap={type=roaring, 
compressRunOnSerialization=true}, dimensionCompression=lz4, 
metricCompression=lz4, longEncoding=longs, segmentLoader=null}}'
 ```
 
-*Caveat:* Note that a segment can be served by more than one stream ingestion 
tasks or Historical processes, in that case it would have multiple replicas. 
These replicas are weakly consistent with each other when served by multiple 
ingestion tasks, until a segment is eventually served by a Historical, at that 
point the segment is immutable. Broker prefers to query a segment from 
Historical over an ingestion task. But if a segment has multiple realtime 
replicas, for e.g.. Kafka index task [...]
-
 ### SERVERS table
 
 Servers table lists all discovered servers in the cluster.
diff --git 
a/integration-tests/src/test/resources/results/auth_test_sys_schema_segments.json
 
b/integration-tests/src/test/resources/results/auth_test_sys_schema_segments.json
index d064733ce7..1cb97572db 100644
--- 
a/integration-tests/src/test/resources/results/auth_test_sys_schema_segments.json
+++ 
b/integration-tests/src/test/resources/results/auth_test_sys_schema_segments.json
@@ -9,6 +9,7 @@
     "partition_num": 0,
     "num_replicas": 1,
     "num_rows": 4462111,
+    "is_active": 1,
     "is_published": 1,
     "is_available": 1,
     "is_realtime": 0,
diff --git 
a/sql/src/main/java/org/apache/druid/sql/calcite/schema/SystemSchema.java 
b/sql/src/main/java/org/apache/druid/sql/calcite/schema/SystemSchema.java
index f408edd28b..2b58169c37 100644
--- a/sql/src/main/java/org/apache/druid/sql/calcite/schema/SystemSchema.java
+++ b/sql/src/main/java/org/apache/druid/sql/calcite/schema/SystemSchema.java
@@ -123,6 +123,8 @@ public class SystemSchema extends AbstractSchema
    * where 1 = true and 0 = false to make it easy to count number of segments
    * which are published, available etc.
    */
+  private static final long IS_ACTIVE_FALSE = 0L;
+  private static final long IS_ACTIVE_TRUE = 1L;
   private static final long IS_PUBLISHED_FALSE = 0L;
   private static final long IS_PUBLISHED_TRUE = 1L;
   private static final long IS_AVAILABLE_TRUE = 1L;
@@ -140,6 +142,7 @@ public class SystemSchema extends AbstractSchema
       .add("partition_num", ColumnType.LONG)
       .add("num_replicas", ColumnType.LONG)
       .add("num_rows", ColumnType.LONG)
+      .add("is_active", ColumnType.LONG)
       .add("is_published", ColumnType.LONG)
       .add("is_available", ColumnType.LONG)
       .add("is_realtime", ColumnType.LONG)
@@ -313,7 +316,10 @@ public class SystemSchema extends AbstractSchema
                   (long) segment.getShardSpec().getPartitionNum(),
                   numReplicas,
                   numRows,
-                  IS_PUBLISHED_TRUE, //is_published is true for published 
segments
+                  //is_active is true for published segments that are not 
overshadowed
+                  val.isOvershadowed() ? IS_ACTIVE_FALSE : IS_ACTIVE_TRUE,
+                  //is_published is true for published segments
+                  IS_PUBLISHED_TRUE,
                   isAvailable,
                   isRealtime,
                   val.isOvershadowed() ? IS_OVERSHADOWED_TRUE : 
IS_OVERSHADOWED_FALSE,
@@ -350,8 +356,10 @@ public class SystemSchema extends AbstractSchema
                   (long) 
val.getValue().getSegment().getShardSpec().getPartitionNum(),
                   numReplicas,
                   val.getValue().getNumRows(),
-                  IS_PUBLISHED_FALSE,
+                  // is_active is true for unpublished segments iff they are 
realtime
+                  val.getValue().isRealtime() /* is_active */,
                   // is_published is false for unpublished segments
+                  IS_PUBLISHED_FALSE,
                   // is_available is assumed to be always true for segments 
announced by historicals or realtime tasks
                   IS_AVAILABLE_TRUE,
                   val.getValue().isRealtime(),
diff --git 
a/sql/src/test/java/org/apache/druid/sql/calcite/schema/SystemSchemaTest.java 
b/sql/src/test/java/org/apache/druid/sql/calcite/schema/SystemSchemaTest.java
index 8773034b7d..4b76fbf8e2 100644
--- 
a/sql/src/test/java/org/apache/druid/sql/calcite/schema/SystemSchemaTest.java
+++ 
b/sql/src/test/java/org/apache/druid/sql/calcite/schema/SystemSchemaTest.java
@@ -520,7 +520,7 @@ public class SystemSchemaTest extends CalciteTestBase
     final RelDataType rowType = segmentsTable.getRowType(new 
JavaTypeFactoryImpl());
     final List<RelDataTypeField> fields = rowType.getFieldList();
 
-    Assert.assertEquals(17, fields.size());
+    Assert.assertEquals(18, fields.size());
 
     final SystemSchema.TasksTable tasksTable = (SystemSchema.TasksTable) 
schema.getTableMap().get("tasks");
     final RelDataType sysRowType = tasksTable.getRowType(new 
JavaTypeFactoryImpl());
@@ -708,14 +708,15 @@ public class SystemSchemaTest extends CalciteTestBase
     Assert.assertEquals(partitionNum, row[6]);
     Assert.assertEquals(numReplicas, row[7]);
     Assert.assertEquals(numRows, row[8]);
-    Assert.assertEquals(isPublished, row[9]);
-    Assert.assertEquals(isAvailable, row[10]);
-    Assert.assertEquals(isRealtime, row[11]);
-    Assert.assertEquals(isOvershadowed, row[12]);
+    Assert.assertEquals((((isPublished == 1) && (isOvershadowed == 0)) || 
(isRealtime == 1)) ? 1L : 0L, row[9]);
+    Assert.assertEquals(isPublished, row[10]);
+    Assert.assertEquals(isAvailable, row[11]);
+    Assert.assertEquals(isRealtime, row[12]);
+    Assert.assertEquals(isOvershadowed, row[13]);
     if (compactionState == null) {
-      Assert.assertNull(row[16]);
+      Assert.assertNull(row[17]);
     } else {
-      Assert.assertEquals(mapper.writeValueAsString(compactionState), row[16]);
+      Assert.assertEquals(mapper.writeValueAsString(compactionState), row[17]);
     }
   }
 
diff --git a/website/.spelling b/website/.spelling
index f9bed1a729..f0b5b0370c 100644
--- a/website/.spelling
+++ b/website/.spelling
@@ -533,6 +533,7 @@ error_msg
 exprs
 group_id
 interval_expr
+is_active
 is_available
 is_leader
 is_overshadowed


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to