jsun98 commented on a change in pull request #6431: Add Kinesis Indexing 
Service to core Druid
URL: https://github.com/apache/incubator-druid/pull/6431#discussion_r239613887
 
 

 ##########
 File path: docs/content/development/extensions-core/kinesis-ingestion.md
 ##########
 @@ -0,0 +1,176 @@
+# Kinesis Indexing Service
+
+Pull request [Link](https://github.com/apache/incubator-druid/pull/6431)
+
+Similar to the [Kafka indexing 
service](http://druid.io/docs/0.10.0/development/extensions-core/kafka-ingestion.html),
+the Kinesis indexing service uses supervisors which run on the overlord and 
manage the creation and lifetime of Kinesis
+indexing tasks. This indexing service can handle non-recent events and 
provides exactly-once ingestion semantics.
+
+The Kinesis indexing service is provided as the 
`druid-kinesis-indexing-service` core extension (see
+[Including 
Extensions](http://druid.io/docs/0.10.0/operations/including-extensions.html)). 
Please note that this is
+currently designated as an *experimental feature* and is subject to the usual
+[experimental 
caveats](http://druid.io/docs/0.10.0/development/experimental.html).
+
+## Submitting a Supervisor Spec
+
+The Kinesis indexing service requires that the 
`druid-kinesis-indexing-service` extension be loaded on both the overlord
+and the middle managers. A supervisor for a dataSource is started by 
submitting a supervisor spec via HTTP POST to
+`http://<OVERLORD_IP>:<OVERLORD_PORT>/druid/indexer/v1/supervisor`, for 
example:
+
+```
+curl -X POST -H 'Content-Type: application/json' -d @supervisor-spec.json 
http://localhost:8090/druid/indexer/v1/supervisor
+```
+
+A sample supervisor spec is shown below:
+
+```json
+{
+  "type": "kinesis",
+  "dataSchema": {
+    "dataSource": "metrics-kinesis",
+    "parser": {
+      "type": "string",
+      "parseSpec": {
+        "format": "json",
+        "timestampSpec": {
+          "column": "timestamp",
+          "format": "auto"
+        },
+        "dimensionsSpec": {
+          "dimensions": [],
+          "dimensionExclusions": [
+            "timestamp",
+            "value"
+          ]
+        }
+      }
+    },
+    "metricsSpec": [
+      {
+        "name": "count",
+        "type": "count"
+      },
+      {
+        "name": "value_sum",
+        "fieldName": "value",
+        "type": "doubleSum"
+      },
+      {
+        "name": "value_min",
+        "fieldName": "value",
+        "type": "doubleMin"
+      },
+      {
+        "name": "value_max",
+        "fieldName": "value",
+        "type": "doubleMax"
+      }
+    ],
+    "granularitySpec": {
+      "type": "uniform",
+      "segmentGranularity": "HOUR",
+      "queryGranularity": "NONE"
+    }
+  },
+  "tuningConfig": {
+    "type": "kinesis",
+    "maxRowsPerSegment": 5000000
+  },
+  "ioConfig": {
+    "stream": "metrics",
+    "endpoint": "kinesis.us-east-1.amazonaws.com",
+    "taskCount": 1,
+    "replicas": 1,
+    "taskDuration": "PT1H",
+    "recordsPerFetch": 2000,
+    "fetchDelayMillis": 1000
+  }
+}
+```
+
+## Supervisor Configuration
 
 Review comment:
   added

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to