[GitHub] jihoonson commented on a change in pull request #6094: Introduce SystemSchema tables (#5989)

GitBox Thu, 27 Sep 2018 19:22:19 -0700

jihoonson commented on a change in pull request #6094: Introduce SystemSchema 
tables (#5989)
URL: https://github.com/apache/incubator-druid/pull/6094#discussion_r221121700


 ##########
 File path: docs/content/querying/sql.md
 ##########
 @@ -519,6 +524,101 @@ SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE 
TABLE_SCHEMA = 'druid' AND TABLE_
 |COLLATION_NAME||
 |JDBC_TYPE|Type code from java.sql.Types (Druid extension)|
 
+## SYSTEM SCHEMA
+
+The sys schema provides visibility into Druid segments, servers and tasks.
+For example to retrieve all segments for datasource "wikipedia", use the query:
+```sql
+SELECT * FROM sys.segments WHERE datasource = 'wikipedia'
+```
+
+### SEGMENTS table
+Segments table provides details on all Druid segments, whether they are 
published yet or not.
+
+
+|Column|Notes|
+|------|-----|
+|segment_id|Unique segment identifier|
+|datasource|Name of datasource|
+|start|Interval start time (in ISO 8601 format)|
+|end|Interval end time (in ISO 8601 format)|
+|size|Size of segment in bytes|
+|version|Version number (generally an ISO8601 timestamp corresponding to when 
the segment set was first started)|
+|partition_num|Partition number (an integer, unique within a 
datasource+interval+version; may not necessarily be contiguous)|
+|num_replicas|Number of replicas of this segment currently being served|
+|num_rows|Number of rows in current segment, this value could be null if 
unkown to broker at query time|
+|is_published|True if this segment has been published to the metadata store|
+|is_available|True if this segment is currently being served by any server|
+|is_realtime|True if this segment is being served on a realtime server|
+|payload|JSON-serialized datasegment payload|
+
+### SERVERS table
+Servers table lists all data servers(any server that hosts a segment). It 
includes both historicals and peons.
+
+|Column|Notes|
+|------|-----|
+|server|Server name in the form host:port|
+|host|Hostname of the server|
+|plaintext_port|Unsecured port of the server, or -1 if plaintext traffic is 
disabled|
+|tls_port|TLS port of the server, or -1 if TLS is disabled|
+|server_type|Type of Druid service. Possible values include: historical, 
realtime and indexer_executor.|
+|tier|Distribution tier see 
[druid.server.tier](#../configuration/index.html#Historical-General-Configuration)|
+|current_size|Current size of segments in bytes on this server|
+|max_size|Max size in bytes this server recommends to assign to segments see 
[druid.server.maxSize](#../configuration/index.html#Historical-General-Configuration)|
+
+To retrieve information about all servers, use the query:
+```sql
+SELECT * FROM sys.servers;
+```
+
+### SEGMENT_SERVERS table
+
+SEGMENT_SERVERS is used to join segments with servers table
+
+|Column|Notes|
+|------|-----|
+|server|Server name in format host:port (Primary key of [servers 
table](#SERVERS-table))|
+|segment_id|Segment identifier (Primary key of [segments 
table](#SEGMENTS-table))|
+
+JOIN between "segments" and "servers" can be used to query the number of 
segments for a specific datasource, 
+grouped by server, example query:
+```sql
+SELECT count(segments.segment_id) as num_segments from sys.segments as 
segments 
+INNER JOIN sys.segment_servers as segment_servers 
+ON segments.segment_id  = segment_servers.segment_id 
+INNER JOIN sys.servers as servers 
+ON servers.server = segment_servers.server
+WHERE segments.datasource = 'wikipedia' 
+GROUP BY servers.server;
+```
+
+### TASKS table
+
+The tasks table provides information about active and recently-completed 
indexing tasks. For more information 
+check out [ingestion tasks](#../ingestion/tasks.html)
+
+|Column|Notes|
+|------|-----|
+|task_id|Unique task identifier|
+|type|Task type, this should be "index" for indexing tasks|
+|datasource|Datasource name being indexed|
+|created_time|Timestamp in ISO8601 format corresponding to when the ingestion 
task was created. Note that this value is populated for completed and waiting 
tasks. For running and pending tasks this value is set to DateTimes.EPOCH|
 
 Review comment:
   `DateTimes.EPOCH` is in Druid's source code, and I don't think every user 
checks Druid's internal code. Better to say `1970-01-01T00:00:00Z` instead.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] jihoonson commented on a change in pull request #6094: Introduce SystemSchema tables (#5989)

Reply via email to