writer-jill commented on code in PR #13261: URL: https://github.com/apache/druid/pull/13261#discussion_r1020142676
########## docs/tutorials/tutorial-kafka.md: ########## @@ -24,260 +24,267 @@ sidebar_label: "Load from Apache Kafka" --> -## Getting started +This tutorial shows you how to load data into Apache Druid from a Kafka stream, using Druid's Kafka indexing service. -This tutorial demonstrates how to load data into Apache Druid from a Kafka stream, using Druid's Kafka indexing service. +The tutorial guides you through the steps to load sample nested clickstream data from the [Koalas to the Max](https://www.koalastothemax.com/) game into a Kafka topic, then ingest the data into Druid. -For this tutorial, we'll assume you've already downloaded Druid as described in -the [quickstart](index.md) using the `micro-quickstart` single-machine configuration and have it -running on your local machine. You don't need to have loaded any data yet. +## Prerequisites + +Before you follow the steps in this tutorial, download Druid as described in the [quickstart](index.md) using the [micro-quickstart](../operations/single-server.md#micro-quickstart-4-cpu-16gib-ram) single-machine configuration and have it running on your local machine. You don't need to have loaded any data. ## Download and start Kafka -[Apache Kafka](http://kafka.apache.org/) is a high throughput message bus that works well with -Druid. For this tutorial, we will use Kafka 2.7.0. To download Kafka, issue the following -commands in your terminal: +[Apache Kafka](http://kafka.apache.org/) is a high-throughput message bus that works well with Druid. For this tutorial, use Kafka 2.7.0. + +1. To download Kafka, run the following commands in your terminal: -```bash -curl -O https://archive.apache.org/dist/kafka/2.7.0/kafka_2.13-2.7.0.tgz -tar -xzf kafka_2.13-2.7.0.tgz -cd kafka_2.13-2.7.0 -``` -Start zookeeper first with the following command: + ```bash + curl -O https://archive.apache.org/dist/kafka/2.7.0/kafka_2.13-2.7.0.tgz + tar -xzf kafka_2.13-2.7.0.tgz + cd kafka_2.13-2.7.0 + ``` +2. If you're already running Kafka on the machine you're using for this tutorial, delete or rename the `kafka-logs` directory in `/tmp`. + + > Druid and Kafka both rely on [Apache ZooKeeper](https://zookeeper.apache.org/) to coordinate and manage services. Because Druid is already running, Kafka attaches to the Druid ZooKeeper instance when it starts up.<br> + In a production environment where you're running Druid and Kafka on different machines, [start the Kafka ZooKeeper](https://kafka.apache.org/quickstart) before you start the Kafka broker. -```bash -./bin/zookeeper-server-start.sh config/zookeeper.properties -``` +3. In the Kafka root directory, run this command to start a Kafka broker: -Start a Kafka broker by running the following command in a new terminal: + ```bash + ./bin/kafka-server-start.sh config/server.properties + ``` -```bash -./bin/kafka-server-start.sh config/server.properties -``` +4. In a new terminal window, navigate to the Kafka root directory and run the following command to create a Kafka topic called `kttm`: -Run this command to create a Kafka topic called *wikipedia*, to which we'll send data: + ```bash + ./bin/kafka-topics.sh --create --topic kttm --bootstrap-server localhost:9092 + ``` -```bash -./bin/kafka-topics.sh --create --topic wikipedia --bootstrap-server localhost:9092 -``` + Kafka returns a message when it successfully adds the topic: `Created topic kttm`. ## Load data into Kafka -Let's launch a producer for our topic and send some data! +In this section, you download sample data to the tutorial's directory and send the data to your Kafka topic. -In your Druid directory, run the following command: +1. Run the following commands from your Druid root directory to download and extract the sample spec: -```bash -cd quickstart/tutorial -gunzip -c wikiticker-2015-09-12-sampled.json.gz > wikiticker-2015-09-12-sampled.json -``` + ```bash + curl -O https://druid.apache.org/docs/latest/assets/files/kttm-nested-data.json.tgz + tar -xzf kttm-nested-data.json.tgz + ``` -In your Kafka directory, run the following command, where {PATH_TO_DRUID} is replaced by the path to the Druid directory: +2. In your Kafka root directory, run the following commands to post sample events to the `kttm` Kafka topic. Replace `{PATH_TO_DRUID}` with the path to your Druid root directory: -```bash -export KAFKA_OPTS="-Dfile.encoding=UTF-8" -./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic wikipedia < {PATH_TO_DRUID}/quickstart/tutorial/wikiticker-2015-09-12-sampled.json -``` + ```bash + export KAFKA_OPTS="-Dfile.encoding=UTF-8" + ./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic kttm < {PATH_TO_DRUID}/kttm-nested-data.json Review Comment: Fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
