jihoonson commented on a change in pull request #6094: Introduce SystemSchema tables (#5989) URL: https://github.com/apache/incubator-druid/pull/6094#discussion_r221121700
########## File path: docs/content/querying/sql.md ########## @@ -519,6 +524,101 @@ SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'druid' AND TABLE_ |COLLATION_NAME|| |JDBC_TYPE|Type code from java.sql.Types (Druid extension)| +## SYSTEM SCHEMA + +The sys schema provides visibility into Druid segments, servers and tasks. +For example to retrieve all segments for datasource "wikipedia", use the query: +```sql +SELECT * FROM sys.segments WHERE datasource = 'wikipedia' +``` + +### SEGMENTS table +Segments table provides details on all Druid segments, whether they are published yet or not. + + +|Column|Notes| +|------|-----| +|segment_id|Unique segment identifier| +|datasource|Name of datasource| +|start|Interval start time (in ISO 8601 format)| +|end|Interval end time (in ISO 8601 format)| +|size|Size of segment in bytes| +|version|Version number (generally an ISO8601 timestamp corresponding to when the segment set was first started)| +|partition_num|Partition number (an integer, unique within a datasource+interval+version; may not necessarily be contiguous)| +|num_replicas|Number of replicas of this segment currently being served| +|num_rows|Number of rows in current segment, this value could be null if unkown to broker at query time| +|is_published|True if this segment has been published to the metadata store| +|is_available|True if this segment is currently being served by any server| +|is_realtime|True if this segment is being served on a realtime server| +|payload|JSON-serialized datasegment payload| + +### SERVERS table +Servers table lists all data servers(any server that hosts a segment). It includes both historicals and peons. + +|Column|Notes| +|------|-----| +|server|Server name in the form host:port| +|host|Hostname of the server| +|plaintext_port|Unsecured port of the server, or -1 if plaintext traffic is disabled| +|tls_port|TLS port of the server, or -1 if TLS is disabled| +|server_type|Type of Druid service. Possible values include: historical, realtime and indexer_executor.| +|tier|Distribution tier see [druid.server.tier](#../configuration/index.html#Historical-General-Configuration)| +|current_size|Current size of segments in bytes on this server| +|max_size|Max size in bytes this server recommends to assign to segments see [druid.server.maxSize](#../configuration/index.html#Historical-General-Configuration)| + +To retrieve information about all servers, use the query: +```sql +SELECT * FROM sys.servers; +``` + +### SEGMENT_SERVERS table + +SEGMENT_SERVERS is used to join segments with servers table + +|Column|Notes| +|------|-----| +|server|Server name in format host:port (Primary key of [servers table](#SERVERS-table))| +|segment_id|Segment identifier (Primary key of [segments table](#SEGMENTS-table))| + +JOIN between "segments" and "servers" can be used to query the number of segments for a specific datasource, +grouped by server, example query: +```sql +SELECT count(segments.segment_id) as num_segments from sys.segments as segments +INNER JOIN sys.segment_servers as segment_servers +ON segments.segment_id = segment_servers.segment_id +INNER JOIN sys.servers as servers +ON servers.server = segment_servers.server +WHERE segments.datasource = 'wikipedia' +GROUP BY servers.server; +``` + +### TASKS table + +The tasks table provides information about active and recently-completed indexing tasks. For more information +check out [ingestion tasks](#../ingestion/tasks.html) + +|Column|Notes| +|------|-----| +|task_id|Unique task identifier| +|type|Task type, this should be "index" for indexing tasks| +|datasource|Datasource name being indexed| +|created_time|Timestamp in ISO8601 format corresponding to when the ingestion task was created. Note that this value is populated for completed and waiting tasks. For running and pending tasks this value is set to DateTimes.EPOCH| Review comment: `DateTimes.EPOCH` is in Druid's source code, and I don't think every user checks Druid's internal code. Better to say `1970-01-01T00:00:00Z` instead. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
