Re: [PR] Clean up Kinesis doc (druid)

via GitHub Tue, 25 Jul 2023 10:55:41 -0700


techdocsmith commented on code in PR #14529:
URL: https://github.com/apache/druid/pull/14529#discussion_r1273897060



##########
docs/development/extensions-core/kinesis-ingestion.md:
##########
@@ -23,154 +23,259 @@ sidebar_label: "Amazon Kinesis"
   ~ under the License.
   -->
 
-When you enable the Kinesis indexing service, you can configure *supervisors* 
on the Overlord to manage the creation and lifetime of Kinesis indexing tasks. 
These indexing tasks read events using Kinesis' own shard and sequence number 
mechanism to guarantee exactly-once ingestion. The supervisor oversees the 
state of the indexing tasks to:
+When you enable the Kinesis indexing service, you can configure supervisors on 
the Overlord to manage the creation and lifetime of Kinesis indexing tasks. 
These indexing tasks read events using Kinesis' own shard and sequence number 
mechanism to guarantee exactly-once ingestion. The supervisor oversees the 
state of the indexing tasks to coordinate handoffs, manage failures, and ensure 
that scalability and replication requirements are maintained.
 
-- coordinate handoffs
-- manage failures
-- ensure that scalability and replication requirements are maintained.
+This topic contains configuration reference information for the Kinesis 
indexing service supervisor for Apache Druid.
 
-To use the Kinesis indexing service, load the `druid-kinesis-indexing-service` 
core Apache Druid extension (see
-[Including Extensions](../../configuration/extensions.md#loading-extensions)).
+## Setup
 
-> Before you deploy the Kinesis extension to production, read the [Kinesis 
known issues](#kinesis-known-issues).
+To use the Kinesis indexing service, you must first load the 
`druid-kinesis-indexing-service` core extension on both the Overlord and the 
Middle Manager. See [Loading 
extensions](../../configuration/extensions.md#loading-extensions) for more 
information.
+We recommend that you review the [Kinesis known issues](#kinesis-known-issues) 
before deploying the `druid-kinesis-indexing-service` extension to production.
 
-## Submitting a Supervisor Spec
+## Supervisor spec
 
-To use the Kinesis indexing service, load the `druid-kinesis-indexing-service` 
extension on both the Overlord and the MiddleManagers. Druid starts a 
supervisor for a dataSource when you submit a supervisor spec. Submit your 
supervisor spec to the following endpoint:
+The following table outlines the high-level configuration options for the 
Kinesis supervisor object. 
+See [Supervisor API](../../api-reference/supervisor-api.md) for more 
information.
 
-`http://<OVERLORD_IP>:<OVERLORD_PORT>/druid/indexer/v1/supervisor`
+|Property|Type|Description|Required|
+|--------|----|-----------|--------|
+|`type`|String|The supervisor type; this should always be `kinesis`.|Yes|
+|`spec`|Object|The container object for the supervisor configuration.|Yes|
+|`ioConfig`|Object|The [I/O configuration](#supervisor-io-configuration) 
object for configuring Kafka connection and I/O-related settings for the 
supervisor and indexing task.|Yes|
+|`dataSchema`|Object|The schema used by the Kinesis indexing task during 
ingestion. See [`dataSchema`](../../ingestion/ingestion-spec.md#dataschema) for 
more information.|Yes|
+|`tuningConfig`|Object|The [tuning 
configuration](#supervisor-tuning-configuration) object for configuring 
performance-related settings for the supervisor and indexing tasks.|No|
 
-For example:
+Druid starts a new supervisor when you define a supervisor spec.
+To create a supervisor, send a `POST` request to the 
`/druid/indexer/v1/supervisor` endpoint.
+Once created, the supervisor persists in the configured metadata database. 
There can only be a single supervisor per datasource, and submitting a second 
spec for the same datasource overwrites the previous one.
 
-```sh
-curl -X POST -H 'Content-Type: application/json' -d @supervisor-spec.json 
http://localhost:8090/druid/indexer/v1/supervisor
-```
+When an Overlord gains leadership, either by being started or as a result of 
another Overlord failing, it spawns
+a supervisor for each supervisor spec in the metadata database. The supervisor 
then discovers running Kinesis indexing
+tasks and attempts to adopt them if they are compatible with the supervisor's 
configuration. If they are not
+compatible because they have a different ingestion spec or shard allocation, 
the tasks are killed and the
+supervisor creates a new set of tasks. In this way, the supervisors persist 
across Overlord restarts and failovers.
 
-Where the file `supervisor-spec.json` contains a Kinesis supervisor spec:
+The following example shows how to submit a supervisor spec for a stream with 
the name `KinesisStream`.
+In this example, `http://SERVICE_IP:SERVICE_PORT` is a placeholder for the 
server address of deployment and the service port.
 
-```json
-{
+<!--DOCUSAURUS_CODE_TABS-->
+
+<!--cURL-->
+```shell
+curl -X POST "http://SERVICE_IP:SERVICE_PORT/druid/indexer/v1/supervisor"; \

Review Comment:
   normally we're running all HTTP API calls through the router. Retest it with 
this :
   `curl -X POST "http://SERVICE_IP:SERVICE_PORT/druid/router/v1/supervisor"`
   
   `



##########
docs/development/extensions-core/kinesis-ingestion.md:
##########
@@ -23,154 +23,259 @@ sidebar_label: "Amazon Kinesis"
   ~ under the License.
   -->
 
-When you enable the Kinesis indexing service, you can configure *supervisors* 
on the Overlord to manage the creation and lifetime of Kinesis indexing tasks. 
These indexing tasks read events using Kinesis' own shard and sequence number 
mechanism to guarantee exactly-once ingestion. The supervisor oversees the 
state of the indexing tasks to:
+When you enable the Kinesis indexing service, you can configure supervisors on 
the Overlord to manage the creation and lifetime of Kinesis indexing tasks. 
These indexing tasks read events using Kinesis' own shard and sequence number 
mechanism to guarantee exactly-once ingestion. The supervisor oversees the 
state of the indexing tasks to coordinate handoffs, manage failures, and ensure 
that scalability and replication requirements are maintained.
 
-- coordinate handoffs
-- manage failures
-- ensure that scalability and replication requirements are maintained.
+This topic contains configuration reference information for the Kinesis 
indexing service supervisor for Apache Druid.
 
-To use the Kinesis indexing service, load the `druid-kinesis-indexing-service` 
core Apache Druid extension (see
-[Including Extensions](../../configuration/extensions.md#loading-extensions)).
+## Setup
 
-> Before you deploy the Kinesis extension to production, read the [Kinesis 
known issues](#kinesis-known-issues).
+To use the Kinesis indexing service, you must first load the 
`druid-kinesis-indexing-service` core extension on both the Overlord and the 
Middle Manager. See [Loading 
extensions](../../configuration/extensions.md#loading-extensions) for more 
information.
+We recommend that you review the [Kinesis known issues](#kinesis-known-issues) 
before deploying the `druid-kinesis-indexing-service` extension to production.

Review Comment:
   ```suggestion
   Review the [Kinesis known issues](#kinesis-known-issues) before deploying 
the `druid-kinesis-indexing-service` extension to production.
   ```



##########
docs/development/extensions-core/kinesis-ingestion.md:
##########
@@ -23,154 +23,259 @@ sidebar_label: "Amazon Kinesis"
   ~ under the License.
   -->
 
-When you enable the Kinesis indexing service, you can configure *supervisors* 
on the Overlord to manage the creation and lifetime of Kinesis indexing tasks. 
These indexing tasks read events using Kinesis' own shard and sequence number 
mechanism to guarantee exactly-once ingestion. The supervisor oversees the 
state of the indexing tasks to:
+When you enable the Kinesis indexing service, you can configure supervisors on 
the Overlord to manage the creation and lifetime of Kinesis indexing tasks. 
These indexing tasks read events using Kinesis' own shard and sequence number 
mechanism to guarantee exactly-once ingestion. The supervisor oversees the 
state of the indexing tasks to coordinate handoffs, manage failures, and ensure 
that scalability and replication requirements are maintained.
 
-- coordinate handoffs
-- manage failures
-- ensure that scalability and replication requirements are maintained.
+This topic contains configuration reference information for the Kinesis 
indexing service supervisor for Apache Druid.
 
-To use the Kinesis indexing service, load the `druid-kinesis-indexing-service` 
core Apache Druid extension (see
-[Including Extensions](../../configuration/extensions.md#loading-extensions)).
+## Setup
 
-> Before you deploy the Kinesis extension to production, read the [Kinesis 
known issues](#kinesis-known-issues).
+To use the Kinesis indexing service, you must first load the 
`druid-kinesis-indexing-service` core extension on both the Overlord and the 
Middle Manager. See [Loading 
extensions](../../configuration/extensions.md#loading-extensions) for more 
information.
+We recommend that you review the [Kinesis known issues](#kinesis-known-issues) 
before deploying the `druid-kinesis-indexing-service` extension to production.
 
-## Submitting a Supervisor Spec
+## Supervisor spec
 
-To use the Kinesis indexing service, load the `druid-kinesis-indexing-service` 
extension on both the Overlord and the MiddleManagers. Druid starts a 
supervisor for a dataSource when you submit a supervisor spec. Submit your 
supervisor spec to the following endpoint:
+The following table outlines the high-level configuration options for the 
Kinesis supervisor object. 
+See [Supervisor API](../../api-reference/supervisor-api.md) for more 
information.
 
-`http://<OVERLORD_IP>:<OVERLORD_PORT>/druid/indexer/v1/supervisor`
+|Property|Type|Description|Required|
+|--------|----|-----------|--------|
+|`type`|String|The supervisor type; this should always be `kinesis`.|Yes|
+|`spec`|Object|The container object for the supervisor configuration.|Yes|
+|`ioConfig`|Object|The [I/O configuration](#supervisor-io-configuration) 
object for configuring Kafka connection and I/O-related settings for the 
supervisor and indexing task.|Yes|
+|`dataSchema`|Object|The schema used by the Kinesis indexing task during 
ingestion. See [`dataSchema`](../../ingestion/ingestion-spec.md#dataschema) for 
more information.|Yes|
+|`tuningConfig`|Object|The [tuning 
configuration](#supervisor-tuning-configuration) object for configuring 
performance-related settings for the supervisor and indexing tasks.|No|
 
-For example:
+Druid starts a new supervisor when you define a supervisor spec.
+To create a supervisor, send a `POST` request to the 
`/druid/indexer/v1/supervisor` endpoint.
+Once created, the supervisor persists in the configured metadata database. 
There can only be a single supervisor per datasource, and submitting a second 
spec for the same datasource overwrites the previous one.
 
-```sh
-curl -X POST -H 'Content-Type: application/json' -d @supervisor-spec.json 
http://localhost:8090/druid/indexer/v1/supervisor
-```
+When an Overlord gains leadership, either by being started or as a result of 
another Overlord failing, it spawns
+a supervisor for each supervisor spec in the metadata database. The supervisor 
then discovers running Kinesis indexing
+tasks and attempts to adopt them if they are compatible with the supervisor's 
configuration. If they are not
+compatible because they have a different ingestion spec or shard allocation, 
the tasks are killed and the
+supervisor creates a new set of tasks. In this way, the supervisors persist 
across Overlord restarts and failovers.
 
-Where the file `supervisor-spec.json` contains a Kinesis supervisor spec:
+The following example shows how to submit a supervisor spec for a stream with 
the name `KinesisStream`.
+In this example, `http://SERVICE_IP:SERVICE_PORT` is a placeholder for the 
server address of deployment and the service port.
 
-```json
-{
+<!--DOCUSAURUS_CODE_TABS-->
+
+<!--cURL-->
+```shell
+curl -X POST "http://SERVICE_IP:SERVICE_PORT/druid/indexer/v1/supervisor"; \
+-H "Content-Type: application/json" \
+-d '{
   "type": "kinesis",
   "spec": {
+    "ioConfig": {
+      "type": "kinesis",
+      "stream": "KinesisStream",
+      "inputFormat": {
+        "type": "json"
+      },
+      "useEarliestSequenceNumber": true
+    },
+    "tuningConfig": {
+      "type": "kinesis"
+    },
     "dataSchema": {
-      "dataSource": "metrics-kinesis",
+      "dataSource": "KinesisStream",
       "timestampSpec": {
         "column": "timestamp",
-        "format": "auto"
+        "format": "iso"
       },
-     "dimensionsSpec": {
-        "dimensions": [],
-        "dimensionExclusions": [
-         "timestamp",
-         "value"
-       ]
+      "dimensionsSpec": {
+        "dimensions": [
+          "isRobot",
+          "channel",
+          "flags",
+          "isUnpatrolled",
+          "page",
+          "diffUrl",
+          {
+            "type": "long",
+            "name": "added"
+          },
+          "comment",
+          {
+            "type": "long",
+            "name": "commentLength"
+          },
+          "isNew",
+          "isMinor",
+          {
+            "type": "long",
+            "name": "delta"
+          },
+          "isAnonymous",
+          "user",
+          {
+            "type": "long",
+            "name": "deltaBucket"
+          },
+          {
+            "type": "long",
+            "name": "deleted"
+          },
+          "namespace",
+          "cityName",
+          "countryName",
+          "regionIsoCode",
+          "metroCode",
+          "countryIsoCode",
+          "regionName"
+        ]
       },
-     "metricsSpec": [
-        {
-         "name": "count",
-          "type": "count"
-       },
-       {
-          "name": "value_sum",
-          "fieldName": "value",
-          "type": "doubleSum"
-        },
-       {
-         "name": "value_min",
-         "fieldName": "value",
-         "type": "doubleMin"
-       },
-        {
-          "name": "value_max",
-         "fieldName": "value",
-         "type": "doubleMax"
-       }
-     ],
-     "granularitySpec": {
-        "type": "uniform",
-        "segmentGranularity": "HOUR",
-        "queryGranularity": "NONE"
-     }
-   },
+      "granularitySpec": {
+        "queryGranularity": "none",
+        "rollup": false,
+        "segmentGranularity": "hour"
+      }
+    }
+  }
+}'
+```
+<!--HTTP-->
+```HTTP
+POST /druid/indexer/v1/supervisor

Review Comment:
   same comment as line 65 wrt/ router vs indexer



##########
docs/development/extensions-core/kinesis-ingestion.md:
##########
@@ -23,154 +23,259 @@ sidebar_label: "Amazon Kinesis"
   ~ under the License.
   -->
 
-When you enable the Kinesis indexing service, you can configure *supervisors* 
on the Overlord to manage the creation and lifetime of Kinesis indexing tasks. 
These indexing tasks read events using Kinesis' own shard and sequence number 
mechanism to guarantee exactly-once ingestion. The supervisor oversees the 
state of the indexing tasks to:
+When you enable the Kinesis indexing service, you can configure supervisors on 
the Overlord to manage the creation and lifetime of Kinesis indexing tasks. 
These indexing tasks read events using Kinesis' own shard and sequence number 
mechanism to guarantee exactly-once ingestion. The supervisor oversees the 
state of the indexing tasks to coordinate handoffs, manage failures, and ensure 
that scalability and replication requirements are maintained.
 
-- coordinate handoffs
-- manage failures
-- ensure that scalability and replication requirements are maintained.
+This topic contains configuration reference information for the Kinesis 
indexing service supervisor for Apache Druid.
 
-To use the Kinesis indexing service, load the `druid-kinesis-indexing-service` 
core Apache Druid extension (see
-[Including Extensions](../../configuration/extensions.md#loading-extensions)).
+## Setup
 
-> Before you deploy the Kinesis extension to production, read the [Kinesis 
known issues](#kinesis-known-issues).
+To use the Kinesis indexing service, you must first load the 
`druid-kinesis-indexing-service` core extension on both the Overlord and the 
Middle Manager. See [Loading 
extensions](../../configuration/extensions.md#loading-extensions) for more 
information.
+We recommend that you review the [Kinesis known issues](#kinesis-known-issues) 
before deploying the `druid-kinesis-indexing-service` extension to production.
 
-## Submitting a Supervisor Spec
+## Supervisor spec
 
-To use the Kinesis indexing service, load the `druid-kinesis-indexing-service` 
extension on both the Overlord and the MiddleManagers. Druid starts a 
supervisor for a dataSource when you submit a supervisor spec. Submit your 
supervisor spec to the following endpoint:
+The following table outlines the high-level configuration options for the 
Kinesis supervisor object. 
+See [Supervisor API](../../api-reference/supervisor-api.md) for more 
information.
 
-`http://<OVERLORD_IP>:<OVERLORD_PORT>/druid/indexer/v1/supervisor`
+|Property|Type|Description|Required|
+|--------|----|-----------|--------|
+|`type`|String|The supervisor type; this should always be `kinesis`.|Yes|
+|`spec`|Object|The container object for the supervisor configuration.|Yes|
+|`ioConfig`|Object|The [I/O configuration](#supervisor-io-configuration) 
object for configuring Kafka connection and I/O-related settings for the 
supervisor and indexing task.|Yes|
+|`dataSchema`|Object|The schema used by the Kinesis indexing task during 
ingestion. See [`dataSchema`](../../ingestion/ingestion-spec.md#dataschema) for 
more information.|Yes|
+|`tuningConfig`|Object|The [tuning 
configuration](#supervisor-tuning-configuration) object for configuring 
performance-related settings for the supervisor and indexing tasks.|No|
 
-For example:
+Druid starts a new supervisor when you define a supervisor spec.
+To create a supervisor, send a `POST` request to the 
`/druid/indexer/v1/supervisor` endpoint.
+Once created, the supervisor persists in the configured metadata database. 
There can only be a single supervisor per datasource, and submitting a second 
spec for the same datasource overwrites the previous one.
 
-```sh
-curl -X POST -H 'Content-Type: application/json' -d @supervisor-spec.json 
http://localhost:8090/druid/indexer/v1/supervisor
-```
+When an Overlord gains leadership, either by being started or as a result of 
another Overlord failing, it spawns
+a supervisor for each supervisor spec in the metadata database. The supervisor 
then discovers running Kinesis indexing
+tasks and attempts to adopt them if they are compatible with the supervisor's 
configuration. If they are not
+compatible because they have a different ingestion spec or shard allocation, 
the tasks are killed and the
+supervisor creates a new set of tasks. In this way, the supervisors persist 
across Overlord restarts and failovers.
 
-Where the file `supervisor-spec.json` contains a Kinesis supervisor spec:
+The following example shows how to submit a supervisor spec for a stream with 
the name `KinesisStream`.
+In this example, `http://SERVICE_IP:SERVICE_PORT` is a placeholder for the 
server address of deployment and the service port.
 
-```json
-{
+<!--DOCUSAURUS_CODE_TABS-->
+
+<!--cURL-->
+```shell
+curl -X POST "http://SERVICE_IP:SERVICE_PORT/druid/indexer/v1/supervisor"; \
+-H "Content-Type: application/json" \
+-d '{
   "type": "kinesis",
   "spec": {
+    "ioConfig": {
+      "type": "kinesis",
+      "stream": "KinesisStream",
+      "inputFormat": {
+        "type": "json"
+      },
+      "useEarliestSequenceNumber": true
+    },
+    "tuningConfig": {
+      "type": "kinesis"
+    },
     "dataSchema": {
-      "dataSource": "metrics-kinesis",
+      "dataSource": "KinesisStream",
       "timestampSpec": {
         "column": "timestamp",
-        "format": "auto"
+        "format": "iso"
       },
-     "dimensionsSpec": {
-        "dimensions": [],
-        "dimensionExclusions": [
-         "timestamp",
-         "value"
-       ]
+      "dimensionsSpec": {
+        "dimensions": [
+          "isRobot",
+          "channel",
+          "flags",
+          "isUnpatrolled",
+          "page",
+          "diffUrl",
+          {
+            "type": "long",
+            "name": "added"
+          },
+          "comment",
+          {
+            "type": "long",
+            "name": "commentLength"
+          },
+          "isNew",
+          "isMinor",
+          {
+            "type": "long",
+            "name": "delta"
+          },
+          "isAnonymous",
+          "user",
+          {
+            "type": "long",
+            "name": "deltaBucket"
+          },
+          {
+            "type": "long",
+            "name": "deleted"
+          },
+          "namespace",
+          "cityName",
+          "countryName",
+          "regionIsoCode",
+          "metroCode",
+          "countryIsoCode",
+          "regionName"
+        ]
       },
-     "metricsSpec": [
-        {
-         "name": "count",
-          "type": "count"
-       },
-       {
-          "name": "value_sum",
-          "fieldName": "value",
-          "type": "doubleSum"
-        },
-       {
-         "name": "value_min",
-         "fieldName": "value",
-         "type": "doubleMin"
-       },
-        {
-          "name": "value_max",
-         "fieldName": "value",
-         "type": "doubleMax"
-       }
-     ],
-     "granularitySpec": {
-        "type": "uniform",
-        "segmentGranularity": "HOUR",
-        "queryGranularity": "NONE"
-     }
-   },
+      "granularitySpec": {
+        "queryGranularity": "none",
+        "rollup": false,
+        "segmentGranularity": "hour"
+      }
+    }
+  }
+}'
+```
+<!--HTTP-->
+```HTTP
+POST /druid/indexer/v1/supervisor
+HTTP/1.1
+Host: http://SERVICE_IP:SERVICE_PORT
+Content-Type: application/json
+
+{
+  "type": "kinesis",
+  "spec": {
     "ioConfig": {
-     "stream": "metrics",
-     "inputFormat": {
-       "type": "json"
+      "type": "kinesis",
+      "stream": "KinesisStream",
+      "inputFormat": {
+        "type": "json"
+      },
+      "useEarliestSequenceNumber": true
+    },
+    "tuningConfig": {
+      "type": "kinesis"
+    },
+    "dataSchema": {
+      "dataSource": "KinesisStream",
+      "timestampSpec": {
+        "column": "timestamp",
+        "format": "iso"
+      },
+      "dimensionsSpec": {
+        "dimensions": [
+          "isRobot",
+          "channel",
+          "flags",
+          "isUnpatrolled",
+          "page",
+          "diffUrl",
+          {
+            "type": "long",
+            "name": "added"
+          },
+          "comment",
+          {
+            "type": "long",
+            "name": "commentLength"
+          },
+          "isNew",
+          "isMinor",
+          {
+            "type": "long",
+            "name": "delta"
+          },
+          "isAnonymous",
+          "user",
+          {
+            "type": "long",
+            "name": "deltaBucket"
+          },
+          {
+            "type": "long",
+            "name": "deleted"
+          },
+          "namespace",
+          "cityName",
+          "countryName",
+          "regionIsoCode",
+          "metroCode",
+          "countryIsoCode",
+          "regionName"
+        ]
       },
-      "endpoint": "kinesis.us-east-1.amazonaws.com",
-      "taskCount": 1,
-      "replicas": 1,
-      "taskDuration": "PT1H"
-   },
-   "tuningConfig": {
-     "type": "kinesis",
-     "maxRowsPerSegment": 5000000
-   }
+      "granularitySpec": {
+        "queryGranularity": "none",
+        "rollup": false,
+        "segmentGranularity": "hour"
+      }
+    }
   }
 }
 ```
-
-## Supervisor Spec
-
-|Field|Description|Required|
-|--------|-----------|---------|
-|`type`|The supervisor type; this should always be `kinesis`.|yes|
-|`spec`|Container object for the supervisor configuration.|yes|
-|`dataSchema`|The schema that will be used by the Kinesis indexing task during 
ingestion. See 
[`dataSchema`](../../ingestion/ingestion-spec.md#dataschema).|yes|
-|`ioConfig`|An [`ioConfig`](#ioconfig) object for configuring Kafka connection 
and I/O-related settings for the supervisor and indexing task.|yes|
-|`tuningConfig`|A [`tuningConfig`](#tuningconfig) object for configuring 
performance-related settings for the supervisor and indexing tasks.|no|
-
-### `ioConfig`
-
-|Field|Type|Description|Required|
-|-----|----|-----------|--------|
-|`stream`|String|The Kinesis stream to read.|yes|
-|`inputFormat`|Object|[`inputFormat`](../../ingestion/data-formats.md#input-format)
 to specify how to parse input data. See [Specifying data 
format](#specifying-data-format) for details about specifying the input 
format.|yes|
-|`endpoint`|String|The AWS Kinesis stream endpoint for a region. You can find 
a list of endpoints 
[here](http://docs.aws.amazon.com/general/latest/gr/rande.html#ak_region).|no 
(default == kinesis.us-east-1.amazonaws.com)|
-|`replicas`|Integer|The number of replica sets, where 1 means a single set of 
tasks (no replication). Replica tasks will always be assigned to different 
workers to provide resiliency against process failure.|no (default == 1)|
-|`taskCount`|Integer|The maximum number of *reading* tasks in a *replica set*. 
This means that the maximum number of reading tasks will be `taskCount * 
replicas` and the total number of tasks (*reading* + *publishing*) will be 
higher than this. See [Capacity Planning](#capacity-planning) below for more 
details. The number of reading tasks will be less than `taskCount` if 
`taskCount > {numKinesisShards}`.|no (default == 1)|
-|`taskDuration`|ISO8601 Period|The length of time before tasks stop reading 
and begin publishing their segment.|no (default == PT1H)|
-|`startDelay`|ISO8601 Period|The period to wait before the supervisor starts 
managing tasks.|no (default == PT5S)|
-|`period`|ISO8601 Period|How often the supervisor will execute its management 
logic. Note that the supervisor will also run in response to certain events 
(such as tasks succeeding, failing, and reaching their taskDuration) so this 
value specifies the maximum time between iterations.|no (default == PT30S)|
-|`useEarliestSequenceNumber`|Boolean|If a supervisor is managing a dataSource 
for the first time, it will obtain a set of starting sequence numbers from 
Kinesis. This flag determines whether it retrieves the earliest or latest 
sequence numbers in Kinesis. Under normal circumstances, subsequent tasks will 
start from where the previous segments ended so this flag will only be used on 
first run.|no (default == false)|
-|`completionTimeout`|ISO8601 Period|The length of time to wait before 
declaring a publishing task as failed and terminating it. If this is set too 
low, your tasks may never publish. The publishing clock for a task begins 
roughly after `taskDuration` elapses.|no (default == PT6H)|
-|`lateMessageRejectionPeriod`|ISO8601 Period|Configure tasks to reject 
messages with timestamps earlier than this period before the task was created; 
for example if this is set to `PT1H` and the supervisor creates a task at 
*2016-01-01T12:00Z*, messages with timestamps earlier than *2016-01-01T11:00Z* 
will be dropped. This may help prevent concurrency issues if your data stream 
has late messages and you have multiple pipelines that need to operate on the 
same segments (e.g. a streaming and a nightly batch ingestion pipeline).|no 
(default == none)|
-|`earlyMessageRejectionPeriod`|ISO8601 Period|Configure tasks to reject 
messages with timestamps later than this period after the task reached its 
taskDuration; for example if this is set to `PT1H`, the taskDuration is set to 
`PT1H` and the supervisor creates a task at *2016-01-01T12:00Z*. Messages with 
timestamps later than *2016-01-01T14:00Z* will be dropped. **Note:** Tasks 
sometimes run past their task duration, for example, in cases of supervisor 
failover. Setting `earlyMessageRejectionPeriod` too low may cause messages to 
be dropped unexpectedly whenever a task runs past its originally configured 
task duration.|no (default == none)|
-|`recordsPerFetch`|Integer|The number of records to request per call to fetch 
records from Kinesis. See [Determining fetch 
settings](#determining-fetch-settings).|no (see [Determining fetch 
settings](#determining-fetch-settings) for defaults)|
-|`fetchDelayMillis`|Integer|Time in milliseconds to wait between subsequent 
calls to fetch records from Kinesis. See [Determining fetch 
settings](#determining-fetch-settings).|no (default == 0)|
-|`awsAssumedRoleArn`|String|The AWS assumed role to use for additional 
permissions.|no|
-|`awsExternalId`|String|The AWS external id to use for additional 
permissions.|no|
-|`deaggregate`|Boolean|Whether to use the de-aggregate function of the KCL. 
See below for details.|no|
-|`autoScalerConfig`|Object|Defines auto scaling behavior for Kinesis ingest 
tasks. See [Tasks Autoscaler Properties](#task-autoscaler-properties).|no 
(default == null)|
-
-#### Task Autoscaler Properties
-
-| Property | Description | Required |
-| ------------- | ------------- | ------------- |
-| `enableTaskAutoScaler` | Enable or disable the auto scaler. When false or 
absent, Druid disables the `autoScaler` even when `autoScalerConfig` is not 
null.| no (default == false) |
-| `taskCountMax` | Maximum number of Kinesis ingestion tasks. Must be greater 
than or equal to `taskCountMin`. If greater than `{numKinesisShards}`, the 
maximum number of reading tasks is `{numKinesisShards}` and `taskCountMax` is 
ignored.  | yes |
-| `taskCountMin` | Minimum number of Kinesis ingestion tasks. When you enable 
the auto scaler, Druid ignores the value of taskCount in `IOConfig` and 
uses`taskCountMin` for the initial number of tasks to launch.| yes |
-| `minTriggerScaleActionFrequencyMillis` | Minimum time interval between two 
scale actions | no (default == 600000) |
-| `autoScalerStrategy` | The algorithm of `autoScaler`. ONLY `lagBased` is 
supported for now. See [Lag Based AutoScaler Strategy Related 
Properties](#lag-based-autoscaler-strategy-related-properties) for details.| no 
(default == `lagBased`) |
-
-##### Lag Based AutoScaler Strategy Related Properties
-
-The Kinesis indexing service reports lag metrics measured in time milliseconds 
rather than message count which is used by Kafka.
-
-| Property | Description | Required |
-| ------------- | ------------- | ------------- |
-| `lagCollectionIntervalMillis` | Period of lag points collection.  | no 
(default == 30000) |
-| `lagCollectionRangeMillis` | The total time window of lag collection, Use 
with `lagCollectionIntervalMillis`，it means that in the recent 
`lagCollectionRangeMillis`, collect lag metric points every 
`lagCollectionIntervalMillis`. | no (default == 600000) |
-| `scaleOutThreshold` | The Threshold of scale out action | no (default == 
6000000) |
-| `triggerScaleOutFractionThreshold` | If `triggerScaleOutFractionThreshold` 
percent of lag points are higher than `scaleOutThreshold`, then do scale out 
action. | no (default == 0.3) |
-| `scaleInThreshold` | The Threshold of scale in action | no (default == 
1000000) |
-| `triggerScaleInFractionThreshold` | If `triggerScaleInFractionThreshold` 
percent of lag points are lower than `scaleOutThreshold`, then do scale in 
action. | no (default == 0.9) |
-| `scaleActionStartDelayMillis` | Number of milliseconds to delay after the 
supervisor starts before the first scale logic check. | no (default == 300000) |
-| `scaleActionPeriodMillis` | Frequency in milliseconds to check if a scale 
action is triggered | no (default == 60000) |
-| `scaleInStep` | Number of tasks to reduce at a time when scaling down | no 
(default == 1) |
-| `scaleOutStep` | Number of tasks to add at a time when scaling out | no 
(default == 2) |
-
-The following example demonstrates a supervisor spec with `lagBased` 
autoScaler enabled:
+<!--END_DOCUSAURUS_CODE_TABS-->
+
+## Supervisor I/O configuration
+
+The following table outlines the configuration options for `ioConfig`:
+
+|Property|Type|Description|Required|Default|
+|--------|----|-----------|--------|-------|
+|`stream`|String|The Kinesis stream to read.|Yes||
+|`inputFormat`|Object|The [input 
format](../../ingestion/data-formats.md#input-format) to specify how to parse 
input data. See [Specify data format](#specify-data-format) for more 
information.|Yes||
+|`endpoint`|String|The AWS Kinesis stream endpoint for a region. You can find 
a list of endpoints in the [AWS service 
endpoints](http://docs.aws.amazon.com/general/latest/gr/rande.html#ak_region) 
document.|No|`kinesis.us-east-1.amazonaws.com`|
+|`replicas`|Integer|The number of replica sets, where 1 is a single set of 
tasks (no replication). Druid always assigns replicate tasks to different 
workers to provide resiliency against process failure.|No|1|
+|`taskCount`|Integer|The maximum number of reading tasks in a replica set. 
Multiply `taskCount` and `replicas` to measure the maximum number of reading 
tasks. <br />The total number of tasks (reading and publishing) is higher than 
the maximum number of reading tasks. See [Capacity 
planning](#capacity-planning) for more details. When `taskCount > 
{numKinesisShards}`, the actual number of reading tasks is less than the 
`taskCount` value.|No|1|
+|`taskDuration`|ISO 8601 period|The length of time before tasks stop reading 
and begin publishing their segments.|No|PT1H|
+|`startDelay`|ISO 8601 period|The period to wait before the supervisor starts 
managing tasks.|No|PT5S|
+|`period`|ISO 8601 period|Determines how often the supervisor executes its 
management logic. Note that the supervisor also runs in response to certain 
events, such as tasks succeeding, failing, and reaching their task duration, so 
this value specifies the maximum time between iterations.|No|PT30S|
+|`useEarliestSequenceNumber`|Boolean|If a supervisor is managing a datasource 
for the first time, it obtains a set of starting sequence numbers from Kinesis. 
This flag determines whether a supervisor retrieves the earliest or latest 
sequence numbers in Kinesis. Under normal circumstances, subsequent tasks start 
from where the previous segments ended so this flag is only used on the first 
run.|No|`false`|
+|`completionTimeout`|ISO 8601 period|The length of time to wait before Druid 
declares a publishing task has failed and terminates it. If this is set too 
low, your tasks may never publish. The publishing clock for a task begins 
roughly after `taskDuration` elapses.|No|PT6H|
+|`lateMessageRejectionPeriod`|ISO 8601 period|Configure tasks to reject 
messages with timestamps earlier than this period before the task is created. 
For example, if `lateMessageRejectionPeriod` is set to `PT1H` and the 
supervisor creates a task at `2016-01-01T12:00Z`, messages with timestamps 
earlier than `2016-01-01T11:00Z` are dropped. This may help prevent concurrency 
issues if your data stream has late messages and you have multiple pipelines 
that need to operate on the same segments, such as a streaming and a nightly 
batch ingestion pipeline.|No||
+|`earlyMessageRejectionPeriod`|ISO 8601 period|Configure tasks to reject 
messages with timestamps later than this period after the task reached its 
`taskDuration`. For example, if `earlyMessageRejectionPeriod` is set to `PT1H`, 
the `taskDuration` is set to `PT1H` and the supervisor creates a task at 
`2016-01-01T12:00Z`. Messages with timestamps later than `2016-01-01T14:00Z` 
are dropped. **Note:** Tasks sometimes run past their task duration, for 
example, in cases of supervisor failover. Setting `earlyMessageRejectionPeriod` 
too low may cause messages to be dropped unexpectedly whenever a task runs past 
its originally configured task duration.|No||
+|`recordsPerFetch`|Integer|The number of records to request per call to fetch 
records from Kinesis.|No| See [Determine fetch 
settings](#determine-fetch-settings) for defaults.|
+|`fetchDelayMillis`|Integer|Time in milliseconds to wait between subsequent 
calls to fetch records from Kinesis. See [Determine fetch 
settings](#determine-fetch-settings).|No|0|
+|`awsAssumedRoleArn`|String|The AWS assumed role to use for additional 
permissions.|No||
+|`awsExternalId`|String|The AWS external ID to use for additional 
permissions.|No||
+|`deaggregate`|Boolean|Whether to use the deaggregate function of the Kinesis 
Client Library (KCL).|No||
+|`autoScalerConfig`|Object|Defines autoscaling behavior for Kinesis ingest 
tasks. See [Task autoscaler properties](#task-autoscaler-properties) for more 
information.|No|null|
+
+### Task autoscaler properties
+
+The following table outlines the configuration options for `autoScalerConfig`:
+
+|Property|Description|Required|Default|
+|--------|-----------|--------|-------|
+|`enableTaskAutoScaler`|Enables the auto scaler. If not specified, Druid 
disables the auto scaler even when `autoScalerConfig` is not null.|No|`false`|
+|`taskCountMax`|Maximum number of Kinesis ingestion tasks. Must be greater 
than or equal to `taskCountMin`. If greater than `{numKinesisShards}`, Druid 
sets the maximum number of reading tasks to `{numKinesisShards}` and ignores 
`taskCountMax`.|Yes||
+|`taskCountMin`|Minimum number of Kinesis ingestion tasks. When you enable the 
auto scaler, Druid ignores the value of `taskCount` in `IOConfig` and uses 
`taskCountMin` for the initial number of tasks to launch.|Yes||
+|`minTriggerScaleActionFrequencyMillis`|Minimum time interval between two 
scale actions.| No|600000|
+|`autoScalerStrategy`|The algorithm of `autoScaler`. Druid only supports the 
`lagBased` strategy. See [Lag based autoscaler strategy related 
properties](#lag-based-autoscaler-strategy-related-properties) for more 
information.|No|Defaults to `lagBased`.|
+
+### Lag based autoscaler strategy related properties
+
+Unlike the Kafka
+indexing service, Kinesis reports lag metrics measured in time difference in 
milliseconds between the current sequence number and latest sequence number, 
rather than message count.

Review Comment:
   ```suggestion
   Unlike the Kafka indexing service, Kinesis reports lag metrics measured in 
time difference in milliseconds between the current sequence number and latest 
sequence number, rather than message count.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Clean up Kinesis doc (druid)

Reply via email to