sergioferragut commented on code in PR #14742:
URL: https://github.com/apache/druid/pull/14742#discussion_r1289195932
##########
examples/quickstart/jupyter-notebooks/notebooks/02-ingestion/01-streaming-from-kafka.ipynb:
##########
@@ -126,81 +105,70 @@
"metadata": {},
"outputs": [],
"source": [
- "from kafka import KafkaProducer\n",
- "from kafka import KafkaConsumer\n",
+ "# Use kafka_host variable when connecting to kafka \n",
+ "if 'KAFKA_HOST' not in os.environ.keys():\n",
+ " kafka_host=f\"http://localhost:9092\"\n",
+ "else:\n",
+ " kafka_host=f\"{os.environ['KAFKA_HOST']}:9092\"\n",
"\n",
- "# Kafka runs on kafka:9092 in multi-container tutorial application\n",
- "producer = KafkaProducer(bootstrap_servers='kafka:9092')\n",
+ "# this is the kafka topic we will be working with:\n",
"topic_name = \"social_media\""
]
},
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Create the `social_media` topic and send a sample event. The `send()`
command returns a metadata descriptor for the record."
- ]
- },
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
- "event = {\n",
- " \"__time\": \"2023-01-03T16:40:21.501\",\n",
- " \"username\": \"willow\",\n",
- " \"post_title\": \"This title is required\",\n",
- " \"views\": 15284,\n",
- " \"upvotes\": 124,\n",
- " \"comments\": 21,\n",
- " \"edited\": \"True\"\n",
- "}\n",
"\n",
- "producer.send(topic_name, json.dumps(event).encode('utf-8'))"
+ "import json\n",
+ "\n",
+ "# shortcuts for display and sql api's\n",
+ "display = druid.display\n",
+ "sql_client = druid.sql\n",
+ "\n",
+ "\n",
+ "# client for Data Generator API\n",
+ "datagen = druidapi.rest.DruidRestClient(\"http://datagen:9999\")\n",
+ "\n",
+ "# client for Druid API\n",
+ "rest_client = druid.rest\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "To verify that the Kafka topic stored the event, create a consumer client
to read records from the Kafka cluster, and get the next (only) message:"
+ "## Publish generated data directly to Kafka topic"
]
},
{
- "cell_type": "code",
- "execution_count": null,
+ "cell_type": "markdown",
"metadata": {},
- "outputs": [],
"source": [
- "consumer = KafkaConsumer(topic_name, bootstrap_servers=['kafka:9092'],
auto_offset_reset='earliest',\n",
- " enable_auto_commit=True)\n",
- "\n",
- "print(next(consumer).value.decode('utf-8'))"
+ "We are using the datagen that is part of this cluster to generate a
stream of messages. The data generator will create the topic if it does not
exist. This data generator configuration output messages to topic
`social_media`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Load data into Kafka topic"
+ "### Generate data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "Instead of manually creating events to send to the Kafka topic, use a
data generator to simulate a continuous data stream. This tutorial makes use of
Druid Data Driver to simulate a continuous data stream into the `social_media`
Kafka topic. To learn more about the Druid Data Driver, see the Druid Summit
talk, [Generating Time centric Data for Apache
Druid](https://www.youtube.com/watch?v=3zAOeLe3iAo).\n",
- "\n",
- "In this notebook, you use a background process to continuously load data
into the Kafka topic.\n",
- "This allows you to keep executing commands in this notebook while data is
constantly being streamed into the topic."
+ "Instead of manually creating events to send to the Kafka topic, use a
data generator to simulate a continuous data stream. This tutorial makes use of
Druid Data Generator to simulate a continuous data stream into the
`social_media` Kafka topic. To learn more about the Druid Data Generator, see
the [project](https://github.com/implydata/druid-datagenerator) and/or the
[data generation notebook](../01-introduction/02-datagen-intro.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "Run the following cells to load sample data into the `social_media` Kafka
topic:"
+ "Run the following cells to load sample data into the `social_media` Kafka
topic. It is configured to generate events until it reaches 50,000 messages:"
Review Comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]