Re: [PR] Adding data generation pod to jupyter notebooks deployment (druid)

via GitHub Wed, 09 Aug 2023 16:28:08 -0700


vtlim commented on code in PR #14742:
URL: https://github.com/apache/druid/pull/14742#discussion_r1289345447



##########
examples/quickstart/jupyter-notebooks/notebooks/01-introduction/02-datagen-intro.ipynb:
##########
@@ -0,0 +1,624 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "9e07b3f5-d919-4179-91a1-0f6b66c42757",
+   "metadata": {},
+   "source": [
+    "# Data Generator Server\n",
+    "The default Docker Compose deployment includes a data generation service 
created from the published Docker image at `imply/datagen:latest`. \n",
+    "This image is built by the project 
https://github.com/implydata/druid-datagenerator. \n",
+    "\n",
+    "This notebook shows you how to use of the data generation service 
included in the Docker Compose deployment. It explains how to use predefined 
data generator configurations as well as how to build a custom data generator. 
You will also learn how to create sample data files for batch ingestion and how 
to generate live streaming data for streaming ingestion.\n",
+    "\n",
+    "## Table of contents\n",
+    "\n",
+    "* [Initialization](#Initialization)\n",
+    "* [List available configurations](#List-available-configurations)\n",
+    "* [Generate a data file for backfilling 
history](#Generate-a-data-file-for-backfilling-history)\n",
+    "* [Batch ingestion of generated 
files](#Batch-Ingestion-of-Generated-Files)\n",

Review Comment:
   ```suggestion
       "* [Batch ingestion of generated 
files](#Batch-ingestion-of-generated-files)\n",
   ```



##########
examples/quickstart/jupyter-notebooks/notebooks/02-ingestion/01-streaming-from-kafka.ipynb:
##########
@@ -126,81 +107,82 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from kafka import KafkaProducer\n",
-    "from kafka import KafkaConsumer\n",
+    "# Use kafka_host variable when connecting to kafka \n",
+    "if 'KAFKA_HOST' not in os.environ.keys():\n",
+    "   kafka_host=f\"http://localhost:9092\"\n";,
+    "else:\n",
+    "    kafka_host=f\"{os.environ['KAFKA_HOST']}:9092\"\n",
     "\n",
-    "# Kafka runs on kafka:9092 in multi-container tutorial application\n",
-    "producer = KafkaProducer(bootstrap_servers='kafka:9092')\n",
+    "# this is the kafka topic we will be working with:\n",
     "topic_name = \"social_media\""
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Create the `social_media` topic and send a sample event. The `send()` 
command returns a metadata descriptor for the record."
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
-    "event = {\n",
-    "    \"__time\": \"2023-01-03T16:40:21.501\",\n",
-    "    \"username\": \"willow\",\n",
-    "    \"post_title\": \"This title is required\",\n",
-    "    \"views\": 15284,\n",
-    "    \"upvotes\": 124,\n",
-    "    \"comments\": 21,\n",
-    "    \"edited\": \"True\"\n",
-    "}\n",
+    "import json\n",
     "\n",
-    "producer.send(topic_name, json.dumps(event).encode('utf-8'))"
+    "# shortcuts for display and sql api's\n",
+    "display = druid.display\n",
+    "sql_client = druid.sql\n",
+    "\n",
+    "# client for Data Generator API\n",
+    "datagen = druidapi.rest.DruidRestClient(\"http://datagen:9999\";)\n",
+    "\n",
+    "# client for Druid API\n",
+    "rest_client = druid.rest"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "To verify that the Kafka topic stored the event, create a consumer client 
to read records from the Kafka cluster, and get the next (only) message:"
+    "## Publish generated data directly to Kafka topic"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
+   "cell_type": "markdown",
    "metadata": {},
-   "outputs": [],
    "source": [
-    "consumer = KafkaConsumer(topic_name, bootstrap_servers=['kafka:9092'], 
auto_offset_reset='earliest',\n",
-    "     enable_auto_commit=True)\n",
-    "\n",
-    "print(next(consumer).value.decode('utf-8'))"
+    "In this section, you use the data generator included as part of the 
Docker application to generate a stream of messages. The data generator creates 
and send messages to a Kafka topic named `social_media`. To learn more about 
the Druid Data Generator, see the 
[project](https://github.com/implydata/druid-datagenerator) and/or the [data 
generation notebook](../01-introduction/02-datagen-intro.ipynb)\""

Review Comment:
   ```suggestion
       "In this section, you use the data generator included as part of the 
Docker application to generate a stream of messages. The data generator creates 
and send messages to a Kafka topic named `social_media`. To learn more about 
the Druid Data Generator, see the 
[project](https://github.com/implydata/druid-datagenerator) and the [data 
generation notebook](../01-introduction/02-datagen-intro.ipynb)."
   ```



##########
examples/quickstart/jupyter-notebooks/notebooks/01-introduction/02-datagen-intro.ipynb:
##########
@@ -0,0 +1,618 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "9e07b3f5-d919-4179-91a1-0f6b66c42757",
+   "metadata": {},
+   "source": [
+    "# Data Generator Server\n",
+    "The default Docker Compose deployment includes a data generation service 
created from the published Docker image at `imply/datagen:latest`. \n",
+    "This image is built by the project 
https://github.com/implydata/druid-datagenerator. \n",
+    "\n",
+    "This notebook shows you how to use of the data generation service 
included in the Docker Compose deployment. It explains how to use pre-defined 
data generator configurations as well as how to build a custom data generator. 
You will also learn how to create sample data files for batch ingestion and how 
to generate live streaming data for streaming ingestion.\n",
+    "\n",
+    "## Table of contents\n",
+    "\n",
+    "* [Initialization](#Initialization)\n",
+    "* [List Available Configurations](#List-available-configurations)\n",
+    "* [Generate a data file for backfilling 
history](#Generate-a-data-file-for-backfilling-history)\n",
+    "* [Batch Ingestion of Generated 
Files](#Batch-Ingestion-of-Generated-Files)\n",
+    "* [Generate custom data](#Generate-custom-data)\n",
+    "* [Stream generated data](#Stream-generated-data)\n",
+    "* [Ingest data from a stream](#Ingest-data-from-a-stream)\n",
+    "* [Cleanup](#Cleanup)\n",
+    "\n",
+    "\n",
+    "## Initialization\n",
+    "\n",
+    "To interact with the data generation service, use the REST client 
provided in the [`druidapi` Python 
package](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-index.html#python-api-for-druid)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f84766c7-c6a5-4496-91a3-abdb8ddd2375",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import druidapi\n",
+    "import os\n",
+    "\n",
+    "# Datagen client \n",
+    "datagen = druidapi.rest.DruidRestClient(\"http://datagen:9999\";)\n",
+    "\n",
+    "if (os.environ['DRUID_HOST'] == None):\n",
+    "    druid_host=f\"http://router:8888\"\n";,
+    "else:\n",
+    "    druid_host=f\"http://{os.environ['DRUID_HOST']}:8888\"\n",
+    "\n",
+    "# Druid client\n",
+    "druid = druidapi.jupyter_client(druid_host)\n",
+    "\n",
+    "\n",
+    "\n",
+    "# these imports and constants are used by multiple cells\n",
+    "from datetime import datetime, timedelta\n",
+    "import json\n",
+    "\n",
+    "headers = {\n",
+    "  'Content-Type': 'application/json'\n",
+    "}\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c54af617-0998-4010-90c3-9b5a38a09a5f",
+   "metadata": {},
+   "source": [
+    "### List available configurations\n",
+    "Use `/list` API endpoint to get the data generator's available 
configuration values with pre-defined data generator schemas."

Review Comment:
   Would be great for a follow up addition



##########
examples/quickstart/jupyter-notebooks/notebooks/01-introduction/02-datagen-intro.ipynb:
##########
@@ -0,0 +1,624 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "9e07b3f5-d919-4179-91a1-0f6b66c42757",
+   "metadata": {},
+   "source": [
+    "# Data Generator Server\n",
+    "The default Docker Compose deployment includes a data generation service 
created from the published Docker image at `imply/datagen:latest`. \n",
+    "This image is built by the project 
https://github.com/implydata/druid-datagenerator. \n",
+    "\n",
+    "This notebook shows you how to use of the data generation service 
included in the Docker Compose deployment. It explains how to use predefined 
data generator configurations as well as how to build a custom data generator. 
You will also learn how to create sample data files for batch ingestion and how 
to generate live streaming data for streaming ingestion.\n",
+    "\n",
+    "## Table of contents\n",
+    "\n",
+    "* [Initialization](#Initialization)\n",
+    "* [List available configurations](#List-available-configurations)\n",
+    "* [Generate a data file for backfilling 
history](#Generate-a-data-file-for-backfilling-history)\n",
+    "* [Batch ingestion of generated 
files](#Batch-Ingestion-of-Generated-Files)\n",
+    "* [Generate custom data](#Generate-custom-data)\n",
+    "* [Stream generated data](#Stream-generated-data)\n",
+    "* [Ingest data from a stream](#Ingest-data-from-a-stream)\n",
+    "* [Cleanup](#Cleanup)\n",
+    "\n",
+    "\n",
+    "## Initialization\n",
+    "\n",
+    "To interact with the data generation service, use the REST client 
provided in the [`druidapi` Python 
package](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-index.html#python-api-for-druid)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f84766c7-c6a5-4496-91a3-abdb8ddd2375",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import druidapi\n",
+    "import os\n",
+    "import time\n",
+    "\n",
+    "# Datagen client \n",
+    "datagen = druidapi.rest.DruidRestClient(\"http://datagen:9999\";)\n",
+    "\n",
+    "if (os.environ['DRUID_HOST'] == None):\n",
+    "    druid_host=f\"http://router:8888\"\n";,
+    "else:\n",
+    "    druid_host=f\"http://{os.environ['DRUID_HOST']}:8888\"\n",
+    "\n",
+    "# Druid client\n",
+    "druid = druidapi.jupyter_client(druid_host)\n",
+    "\n",
+    "\n",
+    "\n",
+    "# these imports and constants are used by multiple cells\n",
+    "from datetime import datetime, timedelta\n",
+    "import json\n",
+    "\n",
+    "headers = {\n",
+    "  'Content-Type': 'application/json'\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c54af617-0998-4010-90c3-9b5a38a09a5f",
+   "metadata": {},
+   "source": [
+    "### List available configurations\n",
+    "Use the `/list` API endpoint to get the data generator's available 
configuration values with predefined data generator schemas."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1ba6a80a-c49b-4abf-943b-9dad82f2ae13",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display(datagen.get(f\"/list\", require_ok=False).json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae88a3b7-60da-405d-bcf4-fb4affcfe973",
+   "metadata": {},
+   "source": [
+    "### Generate a data file for backfilling history\n",
+    "When generating a file for backfill purposes, you can select the start 
time and the duration of the simulation.\n",
+    "\n",
+    "Configure the data generator request as follows:\n",
+    "* `name`: an arbitrary name you assign to the job. Refer to the job name 
to get the job status or to stop the job.\n",
+    "* `target.type`: \"file\" to generate a data file\n",
+    "* `target.path`: identifies the name of the file to generate, it will 
ignore any path specified.\n",

Review Comment:
   ```suggestion
       "* `target.path`: identifies the name of the file to generate. The data 
generator ignores any path specified and creates the file in the current 
working directory.\n",
   ```



##########
examples/quickstart/jupyter-notebooks/notebooks/01-introduction/02-datagen-intro.ipynb:
##########
@@ -0,0 +1,618 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "9e07b3f5-d919-4179-91a1-0f6b66c42757",
+   "metadata": {},
+   "source": [
+    "# Data Generator Server\n",
+    "The default Docker Compose deployment includes a data generation service 
created from the published Docker image at `imply/datagen:latest`. \n",
+    "This image is built by the project 
https://github.com/implydata/druid-datagenerator. \n",
+    "\n",
+    "This notebook shows you how to use of the data generation service 
included in the Docker Compose deployment. It explains how to use pre-defined 
data generator configurations as well as how to build a custom data generator. 
You will also learn how to create sample data files for batch ingestion and how 
to generate live streaming data for streaming ingestion.\n",
+    "\n",
+    "## Table of contents\n",
+    "\n",
+    "* [Initialization](#Initialization)\n",
+    "* [List Available Configurations](#List-available-configurations)\n",
+    "* [Generate a data file for backfilling 
history](#Generate-a-data-file-for-backfilling-history)\n",
+    "* [Batch Ingestion of Generated 
Files](#Batch-Ingestion-of-Generated-Files)\n",
+    "* [Generate custom data](#Generate-custom-data)\n",
+    "* [Stream generated data](#Stream-generated-data)\n",
+    "* [Ingest data from a stream](#Ingest-data-from-a-stream)\n",
+    "* [Cleanup](#Cleanup)\n",
+    "\n",
+    "\n",
+    "## Initialization\n",
+    "\n",
+    "To interact with the data generation service, use the REST client 
provided in the [`druidapi` Python 
package](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-index.html#python-api-for-druid)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f84766c7-c6a5-4496-91a3-abdb8ddd2375",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import druidapi\n",
+    "import os\n",
+    "\n",
+    "# Datagen client \n",
+    "datagen = druidapi.rest.DruidRestClient(\"http://datagen:9999\";)\n",
+    "\n",
+    "if (os.environ['DRUID_HOST'] == None):\n",
+    "    druid_host=f\"http://router:8888\"\n";,
+    "else:\n",
+    "    druid_host=f\"http://{os.environ['DRUID_HOST']}:8888\"\n",
+    "\n",
+    "# Druid client\n",
+    "druid = druidapi.jupyter_client(druid_host)\n",
+    "\n",
+    "\n",
+    "\n",
+    "# these imports and constants are used by multiple cells\n",
+    "from datetime import datetime, timedelta\n",
+    "import json\n",
+    "\n",
+    "headers = {\n",
+    "  'Content-Type': 'application/json'\n",
+    "}\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c54af617-0998-4010-90c3-9b5a38a09a5f",
+   "metadata": {},
+   "source": [
+    "### List available configurations\n",
+    "Use `/list` API endpoint to get the data generator's available 
configuration values with pre-defined data generator schemas."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1ba6a80a-c49b-4abf-943b-9dad82f2ae13",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display(datagen.get(f\"/list\", require_ok=False).json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae88a3b7-60da-405d-bcf4-fb4affcfe973",
+   "metadata": {},
+   "source": [
+    "### Generate a data file for backfilling history\n",
+    "When generating a file for backfill purposes, you can select the start 
time and the duration of the simulation.\n",
+    "This example shows how to configure the data generator request:\n",
+    "* `name`: an arbitrary name you assign to the job. Refer to the job name 
to get the job status or to stop the job.\n",
+    "* `target.type`: \"file\" to generate a data file\n",
+    "* `target.path`: identifies the name of the file to generate, it will 
ignore any path specified.\n",
+    "* `time_type`,`time`: The data generator simulates the time range you 
specify with a start timestamp in the `\"time_type\"` property and a duration 
in the `\"time\"` property with `h` suffix for hours, `m` for minutes or `s` 
for seconds.\n",
+    "- `\"concurrency\"` indicates the maximum number of entities used 
concurrently to generate events. Each entity is a separate state machine that 
simulates things like user sessions, IoT devices, or other concurrent sources 
of event data. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "811ff58f-75af-4092-a08d-5e07a51592ff",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Configure the start time to one hour prior to the current time. \n",
+    "startDateTime = (datetime.now() - timedelta(hours = 
1)).strftime('%Y-%m-%dT%H:%M:%S.001')\n",
+    "print(f\"Starting to generate history at {startDateTime}.\")\n",
+    "\n",
+    "# give the datagen job a name for use in subsequent API calls\n",
+    "job_name=\"gen_clickstream1\"\n",
+    "\n",
+    "\n",
+    "# Generate a data file on the datagen server\n",
+    "datagen_request = {\n",
+    "    \"name\": job_name,\n",
+    "    \"target\": { \"type\": \"file\", \"path\":\"clicks.json\"},\n",
+    "    \"config_file\": \"clickstream/clickstream.json\", \n",
+    "    \"time_type\": startDateTime,\n",
+    "    \"time\": \"1h\",\n",
+    "    \"concurrency\":100\n",
+    "}\n",
+    "response = datagen.post(\"/start\", json.dumps(datagen_request), 
headers=headers, require_ok=False)\n",
+    "response.json()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d407d1d9-3f01-4128-a014-6a5f371c25a5",
+   "metadata": {},
+   "source": [
+    "#### Display jobs\n",
+    "Use the `/jobs` API endpoint to get the current jobs and job statuses."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3de698c5-bcf4-40c7-b295-728fb54d1f0a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display(datagen.get(f\"/jobs\").json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "972ebed0-34a1-4ad2-909d-69b8b27c3046",
+   "metadata": {},
+   "source": [
+    "#### Get status of a job\n",
+    "Use the `/status/JOB_NAME` API endpoint to get the current jobs and their 
status."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "debce4f8-9c16-476c-9593-21ec984985d2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display(datagen.get(f\"/status/{job_name}\", require_ok=False).json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef818d78-6aa6-4d38-8a43-83416aede96f",
+   "metadata": {},
+   "source": [
+    "#### Stop a job\n",
+    "Use the `/stop/JOB_NAME` API endpoint to stop a job."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7631b8b8-d3d6-4803-9162-587f440d2ef2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display(datagen.post(f\"/stop/{job_name}\", '').json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0a8dc7d3-64e5-41e3-8c28-c5f19c0536f5",
+   "metadata": {},
+   "source": [
+    "#### List files created on datagen server\n",
+    "Use the `/files` API endpoint to list files available on the server."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "06ee36bd-2d2b-4904-9987-10636cf52aac",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display(datagen.get(f\"/files\", '').json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "83ef9edb-98e2-45b4-88e8-578703faedc1",
+   "metadata": {},
+   "source": [
+    "### Batch Ingestion of Generated Files\n",
+    "Use a [Druid HTTP input 
source](https://druid.apache.org/docs/latest/ingestion/native-batch-input-sources.html#http-input-source)
 in the [EXTERN 
function](https://druid.apache.org/docs/latest/multi-stage-query/reference.html#extern-function)
 of a [SQL-based 
ingestion](https://druid.apache.org/docs/latest/multi-stage-query/index.html) 
to load generated files.\n",
+    "You can access files by name using the 
`http://datagen:9999/file/FILE_NAME`. Alternatively, if you are running Druid 
outside of Docker, but on the same machine, use 
`http://localhost:9999/file/FILE_NAME`.\n";,
+    "The following example assumes that both Druid and the data generator 
server are running in Docker Compose."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0d72b015-f8ec-4713-b6f2-fe7a15afff59",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "REPLACE INTO \"clicks\" OVERWRITE ALL\n",
+    "WITH \"ext\" AS (SELECT *\n",
+    "FROM TABLE(\n",
+    "  EXTERN(\n",
+    "    
'{\"type\":\"http\",\"uris\":[\"http://datagen:9999/file/clicks.json\"]}',\n",
+    "    '{\"type\":\"json\"}'\n",
+    "  )\n",
+    ") EXTEND (\"time\" VARCHAR, \"user_id\" VARCHAR, \"event_type\" VARCHAR, 
\"client_ip\" VARCHAR, \"client_device\" VARCHAR, \"client_lang\" VARCHAR, 
\"client_country\" VARCHAR, \"referrer\" VARCHAR, \"keyword\" VARCHAR, 
\"product\" VARCHAR))\n",
+    "SELECT\n",
+    "  TIME_PARSE(\"time\") AS \"__time\",\n",
+    "  \"user_id\",\n",
+    "  \"event_type\",\n",
+    "  \"client_ip\",\n",
+    "  \"client_device\",\n",
+    "  \"client_lang\",\n",
+    "  \"client_country\",\n",
+    "  \"referrer\",\n",
+    "  \"keyword\",\n",
+    "  \"product\"\n",
+    "FROM \"ext\"\n",
+    "PARTITIONED BY DAY\n",
+    "'''  \n",
+    "\n",
+    "druid.display.run_task(sql)\n",
+    "print(\"Waiting for segment avaialbility ...\")\n",
+    "druid.sql.wait_until_ready('clicks')\n",
+    "print(\"Data is available for query.\")"

Review Comment:
   I ran the shutdown cells but maybe I started the second notebook too quickly 
after doing so. Good to know.



##########
examples/quickstart/jupyter-notebooks/notebooks/02-ingestion/01-streaming-from-kafka.ipynb:
##########
@@ -287,14 +295,26 @@
     "  'Content-Type': 'application/json'\n",
     "}\n",
     "\n",
-    "rest_client.post(\"/druid/indexer/v1/supervisor\", kafka_ingestion_spec, 
headers=headers)"
+    "supervisor = rest_client.post(\"/druid/indexer/v1/supervisor\", 
json.dumps(kafka_ingestion_spec), headers=headers)\n",
+    "print(supervisor.status_code)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "A `200` response indicates that the request was successful. You can view 
the running ingestion task and the new datasource in the web console at 
http://localhost:8888/unified-console.html.";
+    "A `200` response indicates that the request was successful. You can view 
the running ingestion task and the new datasource in the web console's 
[ingestion view](http://localhost:8888/unified-console.html#ingestion).\n",
+    "\n",
+    "The following cell will wait until the ingestion has started and the 
datasource is available for querying:"

Review Comment:
   ```suggestion
       "The following cell pauses further execution until the ingestion has 
started and the datasource is available for querying:"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Adding data generation pod to jupyter notebooks deployment (druid)

Reply via email to