techdocsmith commented on code in PR #14529: URL: https://github.com/apache/druid/pull/14529#discussion_r1273897060
########## docs/development/extensions-core/kinesis-ingestion.md: ########## @@ -23,154 +23,259 @@ sidebar_label: "Amazon Kinesis" ~ under the License. --> -When you enable the Kinesis indexing service, you can configure *supervisors* on the Overlord to manage the creation and lifetime of Kinesis indexing tasks. These indexing tasks read events using Kinesis' own shard and sequence number mechanism to guarantee exactly-once ingestion. The supervisor oversees the state of the indexing tasks to: +When you enable the Kinesis indexing service, you can configure supervisors on the Overlord to manage the creation and lifetime of Kinesis indexing tasks. These indexing tasks read events using Kinesis' own shard and sequence number mechanism to guarantee exactly-once ingestion. The supervisor oversees the state of the indexing tasks to coordinate handoffs, manage failures, and ensure that scalability and replication requirements are maintained. -- coordinate handoffs -- manage failures -- ensure that scalability and replication requirements are maintained. +This topic contains configuration reference information for the Kinesis indexing service supervisor for Apache Druid. -To use the Kinesis indexing service, load the `druid-kinesis-indexing-service` core Apache Druid extension (see -[Including Extensions](../../configuration/extensions.md#loading-extensions)). +## Setup -> Before you deploy the Kinesis extension to production, read the [Kinesis known issues](#kinesis-known-issues). +To use the Kinesis indexing service, you must first load the `druid-kinesis-indexing-service` core extension on both the Overlord and the Middle Manager. See [Loading extensions](../../configuration/extensions.md#loading-extensions) for more information. +We recommend that you review the [Kinesis known issues](#kinesis-known-issues) before deploying the `druid-kinesis-indexing-service` extension to production. -## Submitting a Supervisor Spec +## Supervisor spec -To use the Kinesis indexing service, load the `druid-kinesis-indexing-service` extension on both the Overlord and the MiddleManagers. Druid starts a supervisor for a dataSource when you submit a supervisor spec. Submit your supervisor spec to the following endpoint: +The following table outlines the high-level configuration options for the Kinesis supervisor object. +See [Supervisor API](../../api-reference/supervisor-api.md) for more information. -`http://<OVERLORD_IP>:<OVERLORD_PORT>/druid/indexer/v1/supervisor` +|Property|Type|Description|Required| +|--------|----|-----------|--------| +|`type`|String|The supervisor type; this should always be `kinesis`.|Yes| +|`spec`|Object|The container object for the supervisor configuration.|Yes| +|`ioConfig`|Object|The [I/O configuration](#supervisor-io-configuration) object for configuring Kafka connection and I/O-related settings for the supervisor and indexing task.|Yes| +|`dataSchema`|Object|The schema used by the Kinesis indexing task during ingestion. See [`dataSchema`](../../ingestion/ingestion-spec.md#dataschema) for more information.|Yes| +|`tuningConfig`|Object|The [tuning configuration](#supervisor-tuning-configuration) object for configuring performance-related settings for the supervisor and indexing tasks.|No| -For example: +Druid starts a new supervisor when you define a supervisor spec. +To create a supervisor, send a `POST` request to the `/druid/indexer/v1/supervisor` endpoint. +Once created, the supervisor persists in the configured metadata database. There can only be a single supervisor per datasource, and submitting a second spec for the same datasource overwrites the previous one. -```sh -curl -X POST -H 'Content-Type: application/json' -d @supervisor-spec.json http://localhost:8090/druid/indexer/v1/supervisor -``` +When an Overlord gains leadership, either by being started or as a result of another Overlord failing, it spawns +a supervisor for each supervisor spec in the metadata database. The supervisor then discovers running Kinesis indexing +tasks and attempts to adopt them if they are compatible with the supervisor's configuration. If they are not +compatible because they have a different ingestion spec or shard allocation, the tasks are killed and the +supervisor creates a new set of tasks. In this way, the supervisors persist across Overlord restarts and failovers. -Where the file `supervisor-spec.json` contains a Kinesis supervisor spec: +The following example shows how to submit a supervisor spec for a stream with the name `KinesisStream`. +In this example, `http://SERVICE_IP:SERVICE_PORT` is a placeholder for the server address of deployment and the service port. -```json -{ +<!--DOCUSAURUS_CODE_TABS--> + +<!--cURL--> +```shell +curl -X POST "http://SERVICE_IP:SERVICE_PORT/druid/indexer/v1/supervisor" \ Review Comment: normally we're running all HTTP API calls through the router. Retest it with this : `curl -X POST "http://SERVICE_IP:SERVICE_PORT/druid/router/v1/supervisor"` ` ########## docs/development/extensions-core/kinesis-ingestion.md: ########## @@ -23,154 +23,259 @@ sidebar_label: "Amazon Kinesis" ~ under the License. --> -When you enable the Kinesis indexing service, you can configure *supervisors* on the Overlord to manage the creation and lifetime of Kinesis indexing tasks. These indexing tasks read events using Kinesis' own shard and sequence number mechanism to guarantee exactly-once ingestion. The supervisor oversees the state of the indexing tasks to: +When you enable the Kinesis indexing service, you can configure supervisors on the Overlord to manage the creation and lifetime of Kinesis indexing tasks. These indexing tasks read events using Kinesis' own shard and sequence number mechanism to guarantee exactly-once ingestion. The supervisor oversees the state of the indexing tasks to coordinate handoffs, manage failures, and ensure that scalability and replication requirements are maintained. -- coordinate handoffs -- manage failures -- ensure that scalability and replication requirements are maintained. +This topic contains configuration reference information for the Kinesis indexing service supervisor for Apache Druid. -To use the Kinesis indexing service, load the `druid-kinesis-indexing-service` core Apache Druid extension (see -[Including Extensions](../../configuration/extensions.md#loading-extensions)). +## Setup -> Before you deploy the Kinesis extension to production, read the [Kinesis known issues](#kinesis-known-issues). +To use the Kinesis indexing service, you must first load the `druid-kinesis-indexing-service` core extension on both the Overlord and the Middle Manager. See [Loading extensions](../../configuration/extensions.md#loading-extensions) for more information. +We recommend that you review the [Kinesis known issues](#kinesis-known-issues) before deploying the `druid-kinesis-indexing-service` extension to production. -## Submitting a Supervisor Spec +## Supervisor spec -To use the Kinesis indexing service, load the `druid-kinesis-indexing-service` extension on both the Overlord and the MiddleManagers. Druid starts a supervisor for a dataSource when you submit a supervisor spec. Submit your supervisor spec to the following endpoint: +The following table outlines the high-level configuration options for the Kinesis supervisor object. +See [Supervisor API](../../api-reference/supervisor-api.md) for more information. -`http://<OVERLORD_IP>:<OVERLORD_PORT>/druid/indexer/v1/supervisor` +|Property|Type|Description|Required| +|--------|----|-----------|--------| +|`type`|String|The supervisor type; this should always be `kinesis`.|Yes| +|`spec`|Object|The container object for the supervisor configuration.|Yes| +|`ioConfig`|Object|The [I/O configuration](#supervisor-io-configuration) object for configuring Kafka connection and I/O-related settings for the supervisor and indexing task.|Yes| +|`dataSchema`|Object|The schema used by the Kinesis indexing task during ingestion. See [`dataSchema`](../../ingestion/ingestion-spec.md#dataschema) for more information.|Yes| +|`tuningConfig`|Object|The [tuning configuration](#supervisor-tuning-configuration) object for configuring performance-related settings for the supervisor and indexing tasks.|No| -For example: +Druid starts a new supervisor when you define a supervisor spec. +To create a supervisor, send a `POST` request to the `/druid/indexer/v1/supervisor` endpoint. +Once created, the supervisor persists in the configured metadata database. There can only be a single supervisor per datasource, and submitting a second spec for the same datasource overwrites the previous one. -```sh -curl -X POST -H 'Content-Type: application/json' -d @supervisor-spec.json http://localhost:8090/druid/indexer/v1/supervisor -``` +When an Overlord gains leadership, either by being started or as a result of another Overlord failing, it spawns +a supervisor for each supervisor spec in the metadata database. The supervisor then discovers running Kinesis indexing +tasks and attempts to adopt them if they are compatible with the supervisor's configuration. If they are not +compatible because they have a different ingestion spec or shard allocation, the tasks are killed and the +supervisor creates a new set of tasks. In this way, the supervisors persist across Overlord restarts and failovers. -Where the file `supervisor-spec.json` contains a Kinesis supervisor spec: +The following example shows how to submit a supervisor spec for a stream with the name `KinesisStream`. +In this example, `http://SERVICE_IP:SERVICE_PORT` is a placeholder for the server address of deployment and the service port. -```json -{ +<!--DOCUSAURUS_CODE_TABS--> + +<!--cURL--> +```shell +curl -X POST "http://SERVICE_IP:SERVICE_PORT/druid/indexer/v1/supervisor" \ +-H "Content-Type: application/json" \ +-d '{ "type": "kinesis", "spec": { + "ioConfig": { + "type": "kinesis", + "stream": "KinesisStream", + "inputFormat": { + "type": "json" + }, + "useEarliestSequenceNumber": true + }, + "tuningConfig": { + "type": "kinesis" + }, "dataSchema": { - "dataSource": "metrics-kinesis", + "dataSource": "KinesisStream", "timestampSpec": { "column": "timestamp", - "format": "auto" + "format": "iso" }, - "dimensionsSpec": { - "dimensions": [], - "dimensionExclusions": [ - "timestamp", - "value" - ] + "dimensionsSpec": { + "dimensions": [ + "isRobot", + "channel", + "flags", + "isUnpatrolled", + "page", + "diffUrl", + { + "type": "long", + "name": "added" + }, + "comment", + { + "type": "long", + "name": "commentLength" + }, + "isNew", + "isMinor", + { + "type": "long", + "name": "delta" + }, + "isAnonymous", + "user", + { + "type": "long", + "name": "deltaBucket" + }, + { + "type": "long", + "name": "deleted" + }, + "namespace", + "cityName", + "countryName", + "regionIsoCode", + "metroCode", + "countryIsoCode", + "regionName" + ] }, - "metricsSpec": [ - { - "name": "count", - "type": "count" - }, - { - "name": "value_sum", - "fieldName": "value", - "type": "doubleSum" - }, - { - "name": "value_min", - "fieldName": "value", - "type": "doubleMin" - }, - { - "name": "value_max", - "fieldName": "value", - "type": "doubleMax" - } - ], - "granularitySpec": { - "type": "uniform", - "segmentGranularity": "HOUR", - "queryGranularity": "NONE" - } - }, + "granularitySpec": { + "queryGranularity": "none", + "rollup": false, + "segmentGranularity": "hour" + } + } + } +}' +``` +<!--HTTP--> +```HTTP +POST /druid/indexer/v1/supervisor Review Comment: same comment as line 65 wrt/ router vs indexer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
