jon-wei commented on a change in pull request #6126: New quickstart and 
tutorials
URL: https://github.com/apache/incubator-druid/pull/6126#discussion_r208831257
 
 

 ##########
 File path: docs/content/tutorials/tutorial-ingestion-spec.md
 ##########
 @@ -0,0 +1,641 @@
+---
+layout: doc_page
+---
+
+# Tutorial: Writing an ingestion spec
+
+This tutorial will guide the reader through the process of defining an 
ingestion spec, pointing out key considerations and guidelines.
+
+For this tutorial, we'll assume you've already downloaded Druid as described 
in 
+the [single-machine quickstart](index.html) and have it running on your local 
machine. 
+
+It will also be helpful to have finished [Tutorial: Loading a 
file](/docs/VERSION/tutorials/tutorial-batch.html), [Tutorial: Querying 
data](/docs/VERSION/tutorials/tutorial-query.html), and [Tutorial: 
Rollup](/docs/VERSION/tutorials/tutorial-rollup.html).
+
+## Example data
+
+Suppose we have the following network flow data:
+
+* `srcIP`: IP address of sender
+* `srcPort`: Port of sender
+* `dstIP`: IP address of receiver
+* `dstPort`: Port of receiver
+* `protocol`: IP protocol number
+* `packets`: number of packets transmitted
+* `bytes`: number of bytes transmitted
+* `cost`: the cost of sending the traffic
+
+```
+{"ts":"2018-01-01T01:01:35Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", 
"srcPort":2000, "dstPort":3000, "protocol": 6, "packets":10, "bytes":1000, 
"cost": 1.4}
+{"ts":"2018-01-01T01:01:51Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", 
"srcPort":2000, "dstPort":3000, "protocol": 6, "packets":20, "bytes":2000, 
"cost": 3.1}
+{"ts":"2018-01-01T01:01:59Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", 
"srcPort":2000, "dstPort":3000, "protocol": 6, "packets":30, "bytes":3000, 
"cost": 0.4}
+{"ts":"2018-01-01T01:02:14Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", 
"srcPort":5000, "dstPort":7000, "protocol": 6, "packets":40, "bytes":4000, 
"cost": 7.9}
+{"ts":"2018-01-01T01:02:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", 
"srcPort":5000, "dstPort":7000, "protocol": 6, "packets":50, "bytes":5000, 
"cost": 10.2}
+{"ts":"2018-01-01T01:03:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2", 
"srcPort":5000, "dstPort":7000, "protocol": 6, "packets":60, "bytes":6000, 
"cost": 4.3}
+{"ts":"2018-01-01T02:33:14Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8", 
"srcPort":4000, "dstPort":5000, "protocol": 17, "packets":100, "bytes":10000, 
"cost": 22.4}
+{"ts":"2018-01-01T02:33:45Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8", 
"srcPort":4000, "dstPort":5000, "protocol": 17, "packets":200, "bytes":20000, 
"cost": 34.5}
+{"ts":"2018-01-01T02:35:45Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8", 
"srcPort":4000, "dstPort":5000, "protocol": 17, "packets":300, "bytes":30000, 
"cost": 46.3}
+```
+
+Save the JSON contents above into a file called `ingestion-tutorial-data.json`.
+
+Let's walk through the process of defining an ingestion spec that can load 
this data. 
+
+For this tutorial, we will be using the native batch indexing task. When using 
other task types, some aspects of the ingestion spec will differ, and this 
tutorial will point out such areas.
+
+## Defining the schema
+
+The core element of a Druid ingestion spec is the `dataSchema`. The 
`dataSchema` defines how to parse input data into a set of columns that will be 
stored in Druid.
+
+Let's start with an empty `dataSchema` and add fields to it as we progress 
through the tutorial.
+
+Create a new file called `ingestion-tutorial-index.json` with the following 
contents:
+
+```json
+"dataSchema" : {}
+```
+
+We will be making successive edits to this ingestion spec as we progress 
through the tutorial.
+
+### Datasource name
+
+The datasource name is specified by the `dataSource` parameter in the 
`dataSchema.
 
 Review comment:
   Fixed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to