Re: [PR] Clean up Kinesis doc (druid)

via GitHub Tue, 25 Jul 2023 11:28:39 -0700


techdocsmith commented on code in PR #14529:
URL: https://github.com/apache/druid/pull/14529#discussion_r1273897060



##########
docs/development/extensions-core/kinesis-ingestion.md:
##########
@@ -23,154 +23,259 @@ sidebar_label: "Amazon Kinesis"
   ~ under the License.
   -->
 
-When you enable the Kinesis indexing service, you can configure *supervisors* 
on the Overlord to manage the creation and lifetime of Kinesis indexing tasks. 
These indexing tasks read events using Kinesis' own shard and sequence number 
mechanism to guarantee exactly-once ingestion. The supervisor oversees the 
state of the indexing tasks to:
+When you enable the Kinesis indexing service, you can configure supervisors on 
the Overlord to manage the creation and lifetime of Kinesis indexing tasks. 
These indexing tasks read events using Kinesis' own shard and sequence number 
mechanism to guarantee exactly-once ingestion. The supervisor oversees the 
state of the indexing tasks to coordinate handoffs, manage failures, and ensure 
that scalability and replication requirements are maintained.
 
-- coordinate handoffs
-- manage failures
-- ensure that scalability and replication requirements are maintained.
+This topic contains configuration reference information for the Kinesis 
indexing service supervisor for Apache Druid.
 
-To use the Kinesis indexing service, load the `druid-kinesis-indexing-service` 
core Apache Druid extension (see
-[Including Extensions](../../configuration/extensions.md#loading-extensions)).
+## Setup
 
-> Before you deploy the Kinesis extension to production, read the [Kinesis 
known issues](#kinesis-known-issues).
+To use the Kinesis indexing service, you must first load the 
`druid-kinesis-indexing-service` core extension on both the Overlord and the 
Middle Manager. See [Loading 
extensions](../../configuration/extensions.md#loading-extensions) for more 
information.
+We recommend that you review the [Kinesis known issues](#kinesis-known-issues) 
before deploying the `druid-kinesis-indexing-service` extension to production.
 
-## Submitting a Supervisor Spec
+## Supervisor spec
 
-To use the Kinesis indexing service, load the `druid-kinesis-indexing-service` 
extension on both the Overlord and the MiddleManagers. Druid starts a 
supervisor for a dataSource when you submit a supervisor spec. Submit your 
supervisor spec to the following endpoint:
+The following table outlines the high-level configuration options for the 
Kinesis supervisor object. 
+See [Supervisor API](../../api-reference/supervisor-api.md) for more 
information.
 
-`http://<OVERLORD_IP>:<OVERLORD_PORT>/druid/indexer/v1/supervisor`
+|Property|Type|Description|Required|
+|--------|----|-----------|--------|
+|`type`|String|The supervisor type; this should always be `kinesis`.|Yes|
+|`spec`|Object|The container object for the supervisor configuration.|Yes|
+|`ioConfig`|Object|The [I/O configuration](#supervisor-io-configuration) 
object for configuring Kafka connection and I/O-related settings for the 
supervisor and indexing task.|Yes|
+|`dataSchema`|Object|The schema used by the Kinesis indexing task during 
ingestion. See [`dataSchema`](../../ingestion/ingestion-spec.md#dataschema) for 
more information.|Yes|
+|`tuningConfig`|Object|The [tuning 
configuration](#supervisor-tuning-configuration) object for configuring 
performance-related settings for the supervisor and indexing tasks.|No|
 
-For example:
+Druid starts a new supervisor when you define a supervisor spec.
+To create a supervisor, send a `POST` request to the 
`/druid/indexer/v1/supervisor` endpoint.
+Once created, the supervisor persists in the configured metadata database. 
There can only be a single supervisor per datasource, and submitting a second 
spec for the same datasource overwrites the previous one.
 
-```sh
-curl -X POST -H 'Content-Type: application/json' -d @supervisor-spec.json 
http://localhost:8090/druid/indexer/v1/supervisor
-```
+When an Overlord gains leadership, either by being started or as a result of 
another Overlord failing, it spawns
+a supervisor for each supervisor spec in the metadata database. The supervisor 
then discovers running Kinesis indexing
+tasks and attempts to adopt them if they are compatible with the supervisor's 
configuration. If they are not
+compatible because they have a different ingestion spec or shard allocation, 
the tasks are killed and the
+supervisor creates a new set of tasks. In this way, the supervisors persist 
across Overlord restarts and failovers.
 
-Where the file `supervisor-spec.json` contains a Kinesis supervisor spec:
+The following example shows how to submit a supervisor spec for a stream with 
the name `KinesisStream`.
+In this example, `http://SERVICE_IP:SERVICE_PORT` is a placeholder for the 
server address of deployment and the service port.
 
-```json
-{
+<!--DOCUSAURUS_CODE_TABS-->
+
+<!--cURL-->
+```shell
+curl -X POST "http://SERVICE_IP:SERVICE_PORT/druid/indexer/v1/supervisor"; \

Review Comment:
   normally we're running all HTTP API calls through the router. Retest it with 
this :
   `curl -X POST "http://SERVICE_IP:SERVICE_PORT/druid/router/v1/supervisor"`
   
   `



##########
docs/development/extensions-core/kinesis-ingestion.md:
##########
@@ -23,154 +23,259 @@ sidebar_label: "Amazon Kinesis"
   ~ under the License.
   -->
 
-When you enable the Kinesis indexing service, you can configure *supervisors* 
on the Overlord to manage the creation and lifetime of Kinesis indexing tasks. 
These indexing tasks read events using Kinesis' own shard and sequence number 
mechanism to guarantee exactly-once ingestion. The supervisor oversees the 
state of the indexing tasks to:
+When you enable the Kinesis indexing service, you can configure supervisors on 
the Overlord to manage the creation and lifetime of Kinesis indexing tasks. 
These indexing tasks read events using Kinesis' own shard and sequence number 
mechanism to guarantee exactly-once ingestion. The supervisor oversees the 
state of the indexing tasks to coordinate handoffs, manage failures, and ensure 
that scalability and replication requirements are maintained.
 
-- coordinate handoffs
-- manage failures
-- ensure that scalability and replication requirements are maintained.
+This topic contains configuration reference information for the Kinesis 
indexing service supervisor for Apache Druid.
 
-To use the Kinesis indexing service, load the `druid-kinesis-indexing-service` 
core Apache Druid extension (see
-[Including Extensions](../../configuration/extensions.md#loading-extensions)).
+## Setup
 
-> Before you deploy the Kinesis extension to production, read the [Kinesis 
known issues](#kinesis-known-issues).
+To use the Kinesis indexing service, you must first load the 
`druid-kinesis-indexing-service` core extension on both the Overlord and the 
Middle Manager. See [Loading 
extensions](../../configuration/extensions.md#loading-extensions) for more 
information.
+We recommend that you review the [Kinesis known issues](#kinesis-known-issues) 
before deploying the `druid-kinesis-indexing-service` extension to production.
 
-## Submitting a Supervisor Spec
+## Supervisor spec
 
-To use the Kinesis indexing service, load the `druid-kinesis-indexing-service` 
extension on both the Overlord and the MiddleManagers. Druid starts a 
supervisor for a dataSource when you submit a supervisor spec. Submit your 
supervisor spec to the following endpoint:
+The following table outlines the high-level configuration options for the 
Kinesis supervisor object. 
+See [Supervisor API](../../api-reference/supervisor-api.md) for more 
information.
 
-`http://<OVERLORD_IP>:<OVERLORD_PORT>/druid/indexer/v1/supervisor`
+|Property|Type|Description|Required|
+|--------|----|-----------|--------|
+|`type`|String|The supervisor type; this should always be `kinesis`.|Yes|
+|`spec`|Object|The container object for the supervisor configuration.|Yes|
+|`ioConfig`|Object|The [I/O configuration](#supervisor-io-configuration) 
object for configuring Kafka connection and I/O-related settings for the 
supervisor and indexing task.|Yes|
+|`dataSchema`|Object|The schema used by the Kinesis indexing task during 
ingestion. See [`dataSchema`](../../ingestion/ingestion-spec.md#dataschema) for 
more information.|Yes|
+|`tuningConfig`|Object|The [tuning 
configuration](#supervisor-tuning-configuration) object for configuring 
performance-related settings for the supervisor and indexing tasks.|No|
 
-For example:
+Druid starts a new supervisor when you define a supervisor spec.
+To create a supervisor, send a `POST` request to the 
`/druid/indexer/v1/supervisor` endpoint.
+Once created, the supervisor persists in the configured metadata database. 
There can only be a single supervisor per datasource, and submitting a second 
spec for the same datasource overwrites the previous one.
 
-```sh
-curl -X POST -H 'Content-Type: application/json' -d @supervisor-spec.json 
http://localhost:8090/druid/indexer/v1/supervisor
-```
+When an Overlord gains leadership, either by being started or as a result of 
another Overlord failing, it spawns
+a supervisor for each supervisor spec in the metadata database. The supervisor 
then discovers running Kinesis indexing
+tasks and attempts to adopt them if they are compatible with the supervisor's 
configuration. If they are not
+compatible because they have a different ingestion spec or shard allocation, 
the tasks are killed and the
+supervisor creates a new set of tasks. In this way, the supervisors persist 
across Overlord restarts and failovers.
 
-Where the file `supervisor-spec.json` contains a Kinesis supervisor spec:
+The following example shows how to submit a supervisor spec for a stream with 
the name `KinesisStream`.
+In this example, `http://SERVICE_IP:SERVICE_PORT` is a placeholder for the 
server address of deployment and the service port.
 
-```json
-{
+<!--DOCUSAURUS_CODE_TABS-->
+
+<!--cURL-->
+```shell
+curl -X POST "http://SERVICE_IP:SERVICE_PORT/druid/indexer/v1/supervisor"; \
+-H "Content-Type: application/json" \
+-d '{
   "type": "kinesis",
   "spec": {
+    "ioConfig": {
+      "type": "kinesis",
+      "stream": "KinesisStream",
+      "inputFormat": {
+        "type": "json"
+      },
+      "useEarliestSequenceNumber": true
+    },
+    "tuningConfig": {
+      "type": "kinesis"
+    },
     "dataSchema": {
-      "dataSource": "metrics-kinesis",
+      "dataSource": "KinesisStream",
       "timestampSpec": {
         "column": "timestamp",
-        "format": "auto"
+        "format": "iso"
       },
-     "dimensionsSpec": {
-        "dimensions": [],
-        "dimensionExclusions": [
-         "timestamp",
-         "value"
-       ]
+      "dimensionsSpec": {
+        "dimensions": [
+          "isRobot",
+          "channel",
+          "flags",
+          "isUnpatrolled",
+          "page",
+          "diffUrl",
+          {
+            "type": "long",
+            "name": "added"
+          },
+          "comment",
+          {
+            "type": "long",
+            "name": "commentLength"
+          },
+          "isNew",
+          "isMinor",
+          {
+            "type": "long",
+            "name": "delta"
+          },
+          "isAnonymous",
+          "user",
+          {
+            "type": "long",
+            "name": "deltaBucket"
+          },
+          {
+            "type": "long",
+            "name": "deleted"
+          },
+          "namespace",
+          "cityName",
+          "countryName",
+          "regionIsoCode",
+          "metroCode",
+          "countryIsoCode",
+          "regionName"
+        ]
       },
-     "metricsSpec": [
-        {
-         "name": "count",
-          "type": "count"
-       },
-       {
-          "name": "value_sum",
-          "fieldName": "value",
-          "type": "doubleSum"
-        },
-       {
-         "name": "value_min",
-         "fieldName": "value",
-         "type": "doubleMin"
-       },
-        {
-          "name": "value_max",
-         "fieldName": "value",
-         "type": "doubleMax"
-       }
-     ],
-     "granularitySpec": {
-        "type": "uniform",
-        "segmentGranularity": "HOUR",
-        "queryGranularity": "NONE"
-     }
-   },
+      "granularitySpec": {
+        "queryGranularity": "none",
+        "rollup": false,
+        "segmentGranularity": "hour"
+      }
+    }
+  }
+}'
+```
+<!--HTTP-->
+```HTTP
+POST /druid/indexer/v1/supervisor

Review Comment:
   same comment as line 65 wrt/ router vs indexer



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Clean up Kinesis doc (druid)

Reply via email to