[GitHub] [incubator-druid] fjy commented on a change in pull request #8544: Update Kafka loading docs to use the streaming data loader

GitBox Mon, 16 Sep 2019 09:17:30 -0700

fjy commented on a change in pull request #8544: Update Kafka loading docs to 
use the streaming data loader
URL: https://github.com/apache/incubator-druid/pull/8544#discussion_r324764184


 ##########
 File path: docs/tutorials/tutorial-kafka.md
 ##########
 @@ -54,17 +54,126 @@ Run this command to create a Kafka topic called 
*wikipedia*, to which we'll send
 
 ```bash
 ./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 
1 --partitions 1 --topic wikipedia
+```     
+
+## Load data into Kafka
+
+Let's launch a producer for our topic and send some data!
+
+In your Druid directory, run the following command:
+
+```bash
+cd quickstart/tutorial
+gunzip -c wikiticker-2015-09-12-sampled.json.gz > 
wikiticker-2015-09-12-sampled.json
 ```
 
-## Start Druid Kafka ingestion
+In your Kafka directory, run the following command, where {PATH_TO_DRUID} is 
replaced by the path to the Druid directory:
+
+```bash
+export KAFKA_OPTS="-Dfile.encoding=UTF-8"
+./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic wikipedia 
< {PATH_TO_DRUID}/quickstart/tutorial/wikiticker-2015-09-12-sampled.json
+```
+
+The previous command posted sample events to the *wikipedia* Kafka topic.
+Now we will use Druid's Kafka indexing service to ingest messages from our 
newly created topic.
+
+## Loading data with the data loader
+
+Navigate to [localhost:8888](http://localhost:8888) and click `Load data` in 
the console header.
+
+![Data loader init](../assets/tutorial-kafka-data-loader-01.png "Data loader 
init")
+
+Select `Apache Kafka` and click `Connect data`.
+
+![Data loader sample](../assets/tutorial-kafka-data-loader-02.png "Data loader 
sample")
+
+Enter `localhost:9092` as the bootsrap server and `wikipedia` as the topic.
+
+Click `Preview` and make sure that the the data you are seeing is correct.
+
+Once the data is located, you can click "Next: Parse data" to go to the next 
step.
+
+![Data loader parse data](../assets/tutorial-kafka-data-loader-03.png "Data 
loader parse data")
+
+The data loader will try to automatically determine the correct parser for the 
data.
+In this case it will successfully determine `json`.
+Feel free to play around with different parser options to get a preview of how 
Druid will parse your data.
+
+With the `json` parser selected, click `Next: Parse time` to get to the step 
centered around determining your primary timestamp column.
+
+![Data loader parse time](../assets/tutorial-kafka-data-loader-04.png "Data 
loader parse time")
+
+Druid's architecture requires a primary timestamp column (internally stored in 
a column called `__time`).
+If you do not have a timestamp in your data, select `Constant value`.
+In our example, the data loader will determine that the `time` column in our 
raw data is the only candidate that can be used as the primary time column.
+
+Click `Next: ...` twice to go past the `Transform` and `Filter` steps.
+You do not need to enter anything in these steps as applying ingestion time 
transforms and filters are out of scope for this tutorial.
+
+![Data loader schema](../assets/tutorial-kafka-data-loader-05.png "Data loader 
schema")
+
+In the `Configure schema` step, you can configure which dimensions (and 
metrics) will be ingested into Druid.
 
 Review comment:
   I would provide a link to what dimension/metrics mean and how to fine to 
your data schema.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-druid] fjy commented on a change in pull request #8544: Update Kafka loading docs to use the streaming data loader

Reply via email to