jon-wei commented on a change in pull request #6126: New quickstart and tutorials URL: https://github.com/apache/incubator-druid/pull/6126#discussion_r208831257
########## File path: docs/content/tutorials/tutorial-ingestion-spec.md ########## @@ -0,0 +1,641 @@ +--- +layout: doc_page +--- + +# Tutorial: Writing an ingestion spec + +This tutorial will guide the reader through the process of defining an ingestion spec, pointing out key considerations and guidelines. + +For this tutorial, we'll assume you've already downloaded Druid as described in +the [single-machine quickstart](index.html) and have it running on your local machine. + +It will also be helpful to have finished [Tutorial: Loading a file](/docs/VERSION/tutorials/tutorial-batch.html), [Tutorial: Querying data](/docs/VERSION/tutorials/tutorial-query.html), and [Tutorial: Rollup](/docs/VERSION/tutorials/tutorial-rollup.html). + +## Example data + +Suppose we have the following network flow data: + +* `srcIP`: IP address of sender +* `srcPort`: Port of sender +* `dstIP`: IP address of receiver +* `dstPort`: Port of receiver +* `protocol`: IP protocol number +* `packets`: number of packets transmitted +* `bytes`: number of bytes transmitted +* `cost`: the cost of sending the traffic + +``` +{"ts":"2018-01-01T01:01:35Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", "srcPort":2000, "dstPort":3000, "protocol": 6, "packets":10, "bytes":1000, "cost": 1.4} +{"ts":"2018-01-01T01:01:51Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", "srcPort":2000, "dstPort":3000, "protocol": 6, "packets":20, "bytes":2000, "cost": 3.1} +{"ts":"2018-01-01T01:01:59Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", "srcPort":2000, "dstPort":3000, "protocol": 6, "packets":30, "bytes":3000, "cost": 0.4} +{"ts":"2018-01-01T01:02:14Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", "srcPort":5000, "dstPort":7000, "protocol": 6, "packets":40, "bytes":4000, "cost": 7.9} +{"ts":"2018-01-01T01:02:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", "srcPort":5000, "dstPort":7000, "protocol": 6, "packets":50, "bytes":5000, "cost": 10.2} +{"ts":"2018-01-01T01:03:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", "srcPort":5000, "dstPort":7000, "protocol": 6, "packets":60, "bytes":6000, "cost": 4.3} +{"ts":"2018-01-01T02:33:14Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8", "srcPort":4000, "dstPort":5000, "protocol": 17, "packets":100, "bytes":10000, "cost": 22.4} +{"ts":"2018-01-01T02:33:45Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8", "srcPort":4000, "dstPort":5000, "protocol": 17, "packets":200, "bytes":20000, "cost": 34.5} +{"ts":"2018-01-01T02:35:45Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8", "srcPort":4000, "dstPort":5000, "protocol": 17, "packets":300, "bytes":30000, "cost": 46.3} +``` + +Save the JSON contents above into a file called `ingestion-tutorial-data.json`. + +Let's walk through the process of defining an ingestion spec that can load this data. + +For this tutorial, we will be using the native batch indexing task. When using other task types, some aspects of the ingestion spec will differ, and this tutorial will point out such areas. + +## Defining the schema + +The core element of a Druid ingestion spec is the `dataSchema`. The `dataSchema` defines how to parse input data into a set of columns that will be stored in Druid. + +Let's start with an empty `dataSchema` and add fields to it as we progress through the tutorial. + +Create a new file called `ingestion-tutorial-index.json` with the following contents: + +```json +"dataSchema" : {} +``` + +We will be making successive edits to this ingestion spec as we progress through the tutorial. + +### Datasource name + +The datasource name is specified by the `dataSource` parameter in the `dataSchema. Review comment: Fixed ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org