jsun98 commented on a change in pull request #6431: Add Kinesis Indexing Service to core Druid URL: https://github.com/apache/incubator-druid/pull/6431#discussion_r239613887
########## File path: docs/content/development/extensions-core/kinesis-ingestion.md ########## @@ -0,0 +1,176 @@ +# Kinesis Indexing Service + +Pull request [Link](https://github.com/apache/incubator-druid/pull/6431) + +Similar to the [Kafka indexing service](http://druid.io/docs/0.10.0/development/extensions-core/kafka-ingestion.html), +the Kinesis indexing service uses supervisors which run on the overlord and manage the creation and lifetime of Kinesis +indexing tasks. This indexing service can handle non-recent events and provides exactly-once ingestion semantics. + +The Kinesis indexing service is provided as the `druid-kinesis-indexing-service` core extension (see +[Including Extensions](http://druid.io/docs/0.10.0/operations/including-extensions.html)). Please note that this is +currently designated as an *experimental feature* and is subject to the usual +[experimental caveats](http://druid.io/docs/0.10.0/development/experimental.html). + +## Submitting a Supervisor Spec + +The Kinesis indexing service requires that the `druid-kinesis-indexing-service` extension be loaded on both the overlord +and the middle managers. A supervisor for a dataSource is started by submitting a supervisor spec via HTTP POST to +`http://<OVERLORD_IP>:<OVERLORD_PORT>/druid/indexer/v1/supervisor`, for example: + +``` +curl -X POST -H 'Content-Type: application/json' -d @supervisor-spec.json http://localhost:8090/druid/indexer/v1/supervisor +``` + +A sample supervisor spec is shown below: + +```json +{ + "type": "kinesis", + "dataSchema": { + "dataSource": "metrics-kinesis", + "parser": { + "type": "string", + "parseSpec": { + "format": "json", + "timestampSpec": { + "column": "timestamp", + "format": "auto" + }, + "dimensionsSpec": { + "dimensions": [], + "dimensionExclusions": [ + "timestamp", + "value" + ] + } + } + }, + "metricsSpec": [ + { + "name": "count", + "type": "count" + }, + { + "name": "value_sum", + "fieldName": "value", + "type": "doubleSum" + }, + { + "name": "value_min", + "fieldName": "value", + "type": "doubleMin" + }, + { + "name": "value_max", + "fieldName": "value", + "type": "doubleMax" + } + ], + "granularitySpec": { + "type": "uniform", + "segmentGranularity": "HOUR", + "queryGranularity": "NONE" + } + }, + "tuningConfig": { + "type": "kinesis", + "maxRowsPerSegment": 5000000 + }, + "ioConfig": { + "stream": "metrics", + "endpoint": "kinesis.us-east-1.amazonaws.com", + "taskCount": 1, + "replicas": 1, + "taskDuration": "PT1H", + "recordsPerFetch": 2000, + "fetchDelayMillis": 1000 + } +} +``` + +## Supervisor Configuration Review comment: added ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
