[druid] branch 26.0.0 updated: Docs: Tutorial for streaming ingestion using Kafka + Docker file to use with Jupyter tutorials (#13984) (#14289)

vogievetsky Mon, 22 May 2023 14:29:52 -0700

This is an automated email from the ASF dual-hosted git repository.

vogievetsky pushed a commit to branch 26.0.0
in repository https://gitbox.apache.org/repos/asf/druid.git



The following commit(s) were added to refs/heads/26.0.0 by this push:
     new 855e576e87 Docs: Tutorial for streaming ingestion using Kafka + Docker 
file to use with Jupyter tutorials (#13984) (#14289)
855e576e87 is described below

commit 855e576e87096e08c4c85a7fc53584a7801e402b
Author: Victoria Lim <[email protected]>
AuthorDate: Mon May 22 14:29:37 2023 -0700

    Docs: Tutorial for streaming ingestion using Kafka + Docker file to use 
with Jupyter tutorials (#13984) (#14289)
---
 .gitignore                                         |   3 +-
 docs/tutorials/tutorial-jupyter-docker.md          | 201 ++++++
 docs/tutorials/tutorial-jupyter-index.md           |  67 +-
 .../jupyter-notebooks/0-START-HERE.ipynb           |  25 +-
 examples/quickstart/jupyter-notebooks/Dockerfile   |  65 ++
 examples/quickstart/jupyter-notebooks/README.md    |  74 +-
 .../jupyter-notebooks/docker-jupyter/README.md     |  60 ++
 .../docker-jupyter/docker-compose-local.yaml       | 172 +++++
 .../docker-jupyter/docker-compose.yaml             | 170 +++++
 .../jupyter-notebooks/docker-jupyter/environment   |  56 ++
 .../docker-jupyter/kafka_docker_config.json        |  90 +++
 .../docker-jupyter/tutorial-jupyter-docker.zip     | Bin 0 -> 2939 bytes
 .../jupyter-notebooks/kafka-tutorial.ipynb         | 782 +++++++++++++++++++++
 website/sidebars.json                              |   1 +
 14 files changed, 1635 insertions(+), 131 deletions(-)

diff --git a/.gitignore b/.gitignore
index 31b2f9dd1e..a60eb68173 100644
--- a/.gitignore
+++ b/.gitignore
@@ -33,9 +33,10 @@ integration-tests/gen-scripts/
 **/.ipython/
 **/.jupyter/
 **/.local/
+**/druidapi.egg-info/
+examples/quickstart/jupyter-notebooks/docker-jupyter/notebooks
 
 # ignore NetBeans IDE specific files
 nbproject
 nbactions.xml
 nb-configuration.xml
-
diff --git a/docs/tutorials/tutorial-jupyter-docker.md 
b/docs/tutorials/tutorial-jupyter-docker.md
new file mode 100644
index 0000000000..b5aa939db8
--- /dev/null
+++ b/docs/tutorials/tutorial-jupyter-docker.md
@@ -0,0 +1,201 @@
+---
+id: tutorial-jupyter-docker
+title: "Docker for Jupyter Notebook tutorials"
+sidebar_label: "Docker for tutorials"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+
+Apache Druid provides a custom Jupyter container that contains the 
prerequisites
+for all Jupyter-based Druid tutorials, as well as all of the tutorials 
themselves.
+You can run the Jupyter container, as well as containers for Druid and Apache 
Kafka,
+using the Docker Compose file provided in the Druid GitHub repository.
+
+You can run the following combination of applications:
+* [Jupyter only](#start-only-the-jupyter-container)
+* [Jupyter and Druid](#start-jupyter-and-druid)
+* [Jupyter, Druid, and Kafka](#start-jupyter-druid-and-kafka)
+
+## Prerequisites
+
+Jupyter in Docker requires that you have **Docker** and **Docker Compose**.
+We recommend installing these through [Docker 
Desktop](https://docs.docker.com/desktop/).
+
+## Launch the Docker containers
+
+You run Docker Compose to launch Jupyter and optionally Druid or Kafka.
+Docker Compose references the configuration in `docker-compose.yaml`.
+Running Druid in Docker also requires the `environment` file, which
+sets the configuration properties for the Druid services.
+To get started, download both `docker-compose.yaml` and `environment` from
+[`tutorial-jupyter-docker.zip`](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/docker-jupyter/tutorial-jupyter-docker.zip).
+
+Alternatively, you can clone the [Apache Druid 
repo](https://github.com/apache/druid) and
+access the files in 
`druid/examples/quickstart/jupyter-notebooks/docker-jupyter`.
+
+### Start only the Jupyter container
+
+If you already have Druid running locally, you can run only the Jupyter 
container to complete the tutorials.
+In the same directory as `docker-compose.yaml`, start the application:
+
+```bash
+docker compose --profile jupyter up -d
+```
+
+The Docker Compose file assigns `8889` for the Jupyter port.
+You can override the port number by setting the `JUPYTER_PORT` environment 
variable before starting the Docker application.
+
+### Start Jupyter and Druid
+
+Running Druid in Docker requires the `environment` file as well as an 
environment variable named `DRUID_VERSION`,
+which determines the version of Druid to use. The Druid version references the 
Docker tag to pull from the
+[Apache Druid Docker Hub](https://hub.docker.com/r/apache/druid/tags).
+
+In the same directory as `docker-compose.yaml` and `environment`, start the 
application:
+
+```bash
+DRUID_VERSION={{DRUIDVERSION}} docker compose --profile druid-jupyter up -d
+```
+
+### Start Jupyter, Druid, and Kafka
+
+Running Druid in Docker requires the `environment` file as well as the 
`DRUID_VERSION` environment variable.
+
+In the same directory as `docker-compose.yaml` and `environment`, start the 
application:
+
+```bash
+DRUID_VERSION={{DRUIDVERSION}} docker compose --profile all-services up -d
+```
+
+### Update image from Docker Hub
+
+If you already have a local cache of the Jupyter image, you can update the 
image before running the application using the following command:
+
+```bash
+docker compose pull jupyter
+```
+
+### Use locally built image
+
+The default Docker Compose file pulls the custom Jupyter Notebook image from a 
third party Docker Hub.
+If you prefer to build the image locally from the official source, do the 
following:
+1. Clone the Apache Druid repository.
+2. Navigate to `examples/quickstart/jupyter-notebooks/docker-jupyter`.
+3. Start the services using `-f docker-compose-local.yaml` in the `docker 
compose` command. For example:
+
+```bash
+DRUID_VERSION={{DRUIDVERSION}} docker compose --profile all-services -f 
docker-compose-local.yaml up -d
+```
+
+## Access Jupyter-based tutorials
+
+The following steps show you how to access the Jupyter notebook tutorials from 
the Docker container.
+At startup, Docker creates and mounts a volume to persist data from the 
container to your local machine.
+This way you can save your work completed within the Docker container.
+
+1. Navigate to the notebooks at http://localhost:8889.
+   > If you set `JUPYTER_PORT` to another port number, replace `8889` with the 
value of the Jupyter port.
+
+2. Select a tutorial. If you don't plan to save your changes, you can use the 
notebook directly as is. Otherwise, continue to the next step.
+
+3. Optional: To save a local copy of your tutorial work,
+select **File > Save as...** from the navigation menu. Then enter 
`work/<notebook name>.ipynb`.
+If the notebook still displays as read only, you may need to refresh the page 
in your browser.
+Access the saved files in the `notebooks` folder in your local working 
directory.
+
+## View the Druid web console
+
+To access the Druid web console in Docker, go to 
http://localhost:8888/unified-console.html.
+Use the web console to view datasources and ingestion tasks that you create in 
the tutorials.
+
+## Stop Docker containers
+
+Shut down the Docker application using the following command:
+
+```bash
+docker compose down -v
+```
+
+## Tutorial setup without using Docker
+
+To use the Jupyter Notebook-based tutorials without using Docker, do the 
following:
+
+1. Clone the Apache Druid repo, or download the 
[tutorials](tutorial-jupyter-index.md#tutorials)
+as well as the [Python client for 
Druid](tutorial-jupyter-index.md#python-api-for-druid).
+
+2. Install the prerequisite Python packages with the following commands:
+
+   ```bash
+   # Install requests
+   pip install requests
+   ```
+
+   ```bash
+   # Install JupyterLab
+   pip install jupyterlab
+   
+   # Install Jupyter Notebook
+   pip install notebook
+   ```
+
+   Individual notebooks may list additional packages you need to install to 
complete the tutorial.
+
+3. In your Druid source repo, install `druidapi` with the following commands:
+
+   ```bash
+   cd examples/quickstart/jupyter-notebooks/druidapi
+   pip install .
+   ```
+
+4. Start Jupyter, in the same directory as the tutorials, using either 
JupyterLab or Jupyter Notebook:
+   ```bash
+   # Start JupyterLab on port 3001
+   jupyter lab --port 3001
+
+   # Start Jupyter Notebook on port 3001
+   jupyter notebook --port 3001
+   ```
+
+5. Start Druid. You can use the [Quickstart (local)](./index.md) instance. The 
tutorials
+   assume that you are using the quickstart, so no authentication or 
authorization
+   is expected unless explicitly mentioned.
+
+   If you contribute to Druid, and work with Druid integration tests, you can 
use a test cluster.
+   Assume you have an environment variable, `DRUID_DEV`, which identifies your 
Druid source repo.
+ 
+   ```bash
+   cd $DRUID_DEV
+   ./it.sh build
+   ./it.sh image
+   ./it.sh up <category>
+   ```
+ 
+   Replace `<category>` with one of the available integration test categories. 
See the integration
+   test `README.md` for details.
+
+You should now be able to access and complete the tutorials.
+
+## Learn more
+
+See the following topics for more information:
+* [Jupyter Notebook tutorials](tutorial-jupyter-index.md) for the available 
Jupyter Notebook-based tutorials for Druid
+* [Tutorial: Run with Docker](docker.md) for running Druid from a Docker 
container
+
diff --git a/docs/tutorials/tutorial-jupyter-index.md 
b/docs/tutorials/tutorial-jupyter-index.md
index d77e0d42b3..d7f401cae5 100644
--- a/docs/tutorials/tutorial-jupyter-index.md
+++ b/docs/tutorials/tutorial-jupyter-index.md
@@ -32,67 +32,34 @@ the Druid API to complete the tutorial.
 
 ## Prerequisites
 
-Make sure you meet the following requirements before starting the 
Jupyter-based tutorials:
+The simplest way to get started is to use Docker. In this case, you only need 
to set up Docker Desktop.
+For more information, see [Docker for Jupyter Notebook 
tutorials](tutorial-jupyter-docker.md).
 
-- Python 3.7 or later
-
-- The `requests` package for Python. For example, you can install it with the 
following command:
-
-   ```bash
-   pip3 install requests
-   ```
-
-- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. 
By default, Druid
-  and Jupyter both try to use port `8888`, so start Jupyter on a different 
port.
-
-
-  - Install JupyterLab or Notebook:
+Otherwise, you can install the prerequisites on your own. Here's what you need:
 
-    ```bash
-    # Install JupyterLab
-    pip3 install jupyterlab
-    # Install Jupyter Notebook
-    pip3 install notebook
-    ```
-  - Start Jupyter using either JupyterLab
-    ```bash
-    # Start JupyterLab on port 3001
-    jupyter lab --port 3001
-    ```
-
-    Or using Jupyter Notebook
-    ```bash
-    # Start Jupyter Notebook on port 3001
-    jupyter notebook --port 3001
-    ```
-
-- An available Druid instance. You can use the [Quickstart 
(local)](./index.md) instance. The tutorials
-  assume that you are using the quickstart, so no authentication or 
authorization
-  is expected unless explicitly mentioned.
-
-  If you contribute to Druid, and work with Druid integration tests, can use a 
test cluster.
-  Assume you have an environment variable, `DRUID_DEV`, which identifies your 
Druid source repo.
-
-  ```bash
-  cd $DRUID_DEV
-  ./it.sh build
-  ./it.sh image
-  ./it.sh up <category>
-  ```
+- An available Druid instance.
+- Python 3.7 or later
+- JupyterLab (recommended) or Jupyter Notebook running on a non-default port.
+By default, Druid and Jupyter both try to use port `8888`, so start Jupyter on 
a different port.
+- The `requests` Python package
+- The `druidapi` Python package
 
-  Replace `<category>` with one of the available integration test categories. 
See the integration
-  test `README.md` for details.
+For setup instructions, see [Tutorial setup without using 
Docker](tutorial-jupyter-docker.md#tutorial-setup-without-using-docker).
+Individual tutorials may require additional Python packages, such as for 
visualization or streaming ingestion.
 
-## Simple Druid API
+## Python API for Druid
 
+The `druidapi` Python package is a REST API for Druid.
 One of the notebooks shows how to use the Druid REST API. The others focus on 
other
 topics and use a simple set of Python wrappers around the underlying REST API. 
The
 wrappers reside in the `druidapi` package within the notebooks directory. 
While the package
 can be used in any Python program, the key purpose, at present, is to support 
these
-notebooks. See the [Introduction to the Druid Python API]
-(https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/python-api-tutorial.ipynb)
+notebooks. See
+[Introduction to the Druid Python 
API](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/python-api-tutorial.ipynb)
 for an overview of the Python API.
 
+The `druidapi` package is already installed in the custom Jupyter Docker 
container for Druid tutorials.
+
 ## Tutorials
 
 The notebooks are located in the [apache/druid 
repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).
 You can either clone the repo or download the notebooks you want individually.
diff --git a/examples/quickstart/jupyter-notebooks/0-START-HERE.ipynb 
b/examples/quickstart/jupyter-notebooks/0-START-HERE.ipynb
index fe4a30a551..5e74fa71c1 100644
--- a/examples/quickstart/jupyter-notebooks/0-START-HERE.ipynb
+++ b/examples/quickstart/jupyter-notebooks/0-START-HERE.ipynb
@@ -41,24 +41,27 @@
    "source": [
     "## Prerequisites\n",
     "\n",
-    "To get this far, you've installed Python 3 and Jupyter Notebook. Make 
sure you meet the following requirements before starting the Jupyter-based 
tutorials:\n",
-    "\n",
-    "- The `requests` package for Python. For example, you can install it with 
the following command:\n",
-    "\n",
-    "   ```bash\n",
-    "   pip install requests\n",
-    "   ````\n",
-    "\n",
-    "- JupyterLab (recommended) or Jupyter Notebook running on a non-default 
port. By default, Druid\n",
-    "  and Jupyter both try to use port `8888`, so start Jupyter on a 
different port.\n",
+    "Before starting the Jupyter-based tutorials, make sure you meet the 
requirements listed in this section.\n",
+    "The simplest way to get started is to use Docker. In this case, you only 
need to set up Docker Desktop.\n",
+    "For more information, see [Docker for Jupyter Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
     "\n",
+    "Otherwise, you need the following:\n",
     "- An available Druid instance. You can use the local quickstart 
configuration\n",
     "  described in 
[Quickstart](https://druid.apache.org/docs/latest/tutorials/index.html).\n",
     "  The tutorials assume that you are using the quickstart, so no 
authentication or authorization\n",
     "  is expected unless explicitly mentioned.\n",
+    "- Python 3.7 or later\n",
+    "- JupyterLab (recommended) or Jupyter Notebook running on a non-default 
port. By default, Druid\n",
+    "  and Jupyter both try to use port `8888`, so start Jupyter on a 
different port.\n",
+    "- The `requests` Python package\n",
+    "- The `druidapi` Python package\n",
+    "\n",
+    "For setup instructions, see [Tutorial setup without using 
Docker](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html#tutorial-setup-without-using-docker).\n",
+    "Individual tutorials may require additional Python packages, such as for 
visualization or streaming ingestion.\n",
     "\n",
     "## Simple Druid API\n",
     "\n",
+    "The `druidapi` Python package is a REST API for Druid.\n",
     "One of the notebooks shows how to use the Druid REST API. The others 
focus on other\n",
     "topics and use a simple set of Python wrappers around the underlying REST 
API. The\n",
     "wrappers reside in the `druidapi` package within this directory. While 
the package\n",
@@ -148,7 +151,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.6"
+   "version": "3.9.5"
   }
  },
  "nbformat": 4,
diff --git a/examples/quickstart/jupyter-notebooks/Dockerfile 
b/examples/quickstart/jupyter-notebooks/Dockerfile
new file mode 100644
index 0000000000..492a4da9c1
--- /dev/null
+++ b/examples/quickstart/jupyter-notebooks/Dockerfile
@@ -0,0 +1,65 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+# -------------------------------------------------------------
+# This Dockerfile creates a custom Docker image for Jupyter
+# to use with the Apache Druid Jupyter notebook tutorials.
+# Build using `docker build -t imply/druid-notebook:latest .`
+# -------------------------------------------------------------
+
+# Use the Jupyter base notebook as the base image
+# Copyright (c) Project Jupyter Contributors.
+# Distributed under the terms of the 3-Clause BSD License.
+FROM jupyter/base-notebook
+
+# Set the container working directory
+WORKDIR /home/jovyan
+
+# Install required Python packages
+RUN pip install requests
+RUN pip install pandas
+RUN pip install numpy
+RUN pip install seaborn
+RUN pip install bokeh
+RUN pip install kafka-python
+RUN pip install sortedcontainers
+
+# Install druidapi client from apache/druid
+# Local install requires sudo privileges 
+USER root
+ADD druidapi /home/jovyan/druidapi
+WORKDIR /home/jovyan/druidapi
+RUN pip install .
+WORKDIR /home/jovyan
+
+# Import data generator and configuration file
+# Change permissions to allow import (requires sudo privileges)
+# WIP -- change to apache repo
+ADD 
https://raw.githubusercontent.com/shallada/druid/data-generator/examples/quickstart/jupyter-notebooks/data-generator/DruidDataDriver.py
 .
+ADD docker-jupyter/kafka_docker_config.json .
+RUN chmod 664 DruidDataDriver.py
+RUN chmod 664 kafka_docker_config.json
+USER jovyan
+
+# Copy the Jupyter notebook tutorials from the
+# build directory to the image working directory
+COPY ./*ipynb .
+
+# Add location of the data generator to PYTHONPATH
+ENV PYTHONPATH "${PYTHONPATH}:/home/jovyan"
+
diff --git a/examples/quickstart/jupyter-notebooks/README.md 
b/examples/quickstart/jupyter-notebooks/README.md
index 826ae5df34..361908c131 100644
--- a/examples/quickstart/jupyter-notebooks/README.md
+++ b/examples/quickstart/jupyter-notebooks/README.md
@@ -1,12 +1,5 @@
 # Jupyter Notebook tutorials for Druid
 
-If you are reading this in Jupyter, switch over to the 
[0-START-HERE](0-START-HERE.ipynb)
-notebook instead.
-
-<!-- This README, the "0-START-HERE" notebook, and the 
tutorial-jupyter-index.md file in
-docs/tutorials share a lot of the same content. If you make a change in one 
place, update
-the other too. -->
-
 <!--
   ~ Licensed to the Apache Software Foundation (ASF) under one
   ~ or more contributor license agreements.  See the NOTICE file
@@ -26,70 +19,13 @@ the other too. -->
   ~ under the License.
   -->
 
+If you are reading this in Jupyter, switch over to the 
[0-START-HERE](0-START-HERE.ipynb)
+notebook instead.
+
 You can try out the Druid APIs using the Jupyter Notebook-based tutorials. 
These
 tutorials provide snippets of Python code that you can use to run calls against
 the Druid API to complete the tutorial.
 
-## Prerequisites
-
-Make sure you meet the following requirements before starting the 
Jupyter-based tutorials:
-
-- Python 3
-
-- The `requests` package for Python. For example, you can install it with the 
following command:
-
-  ```bash
-  pip install requests
-  ```
-
-- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. 
By default, Druid
-  and Jupyter both try to use port `8888`, so start Jupyter on a different 
port.
-
-  - Install JupyterLab or Notebook:
-
-    ```bash
-    # Install JupyterLab
-    pip install jupyterlab
-    # Install Jupyter Notebook
-    pip install notebook
-    ```
-  - Start Jupyter using either JupyterLab
-    ```bash
-    # Start JupyterLab on port 3001
-    jupyter lab --port 3001
-    ```
-
-    Or using Jupyter Notebook
-    ```bash
-    # Start Jupyter Notebook on port 3001
-    jupyter notebook --port 3001
-    ```
-
-- The Python API client for Druid. Clone the Druid repo if you haven't already.
-Go to your Druid source repo and install `druidapi` with the following 
commands:
-
-  ```bash
-  cd examples/quickstart/jupyter-notebooks/druidapi
-  pip install .
-  ```
-
-- An available Druid instance. You can use the [quickstart 
deployment](https://druid.apache.org/docs/latest/tutorials/index.html).
-  The tutorials assume that you are using the quickstart, so no authentication 
or authorization
-  is expected unless explicitly mentioned.
-
-  If you contribute to Druid, and work with Druid integration tests, can use a 
test cluster.
-  Assume you have an environment variable, `DRUID_DEV`, which identifies your 
Druid source repo.
-
-  ```bash
-  cd $DRUID_DEV
-  ./it.sh build
-  ./it.sh image
-  ./it.sh up <category>
-  ```
-
-  Replace `<catagory>` with one of the available integration test categories. 
See the integration 
-  test `README.md` for details.
-
-## Continue in Jupyter
+For information on prerequisites and getting started with the Jupyter-based 
tutorials,
+see [Jupyter Notebook 
tutorials](../../../docs/tutorials/tutorial-jupyter-index.md).
 
-Start Jupyter (see above) and navigate to the "0-START-HERE" notebook for more 
information.
diff --git a/examples/quickstart/jupyter-notebooks/docker-jupyter/README.md 
b/examples/quickstart/jupyter-notebooks/docker-jupyter/README.md
new file mode 100644
index 0000000000..028eb1f9b2
--- /dev/null
+++ b/examples/quickstart/jupyter-notebooks/docker-jupyter/README.md
@@ -0,0 +1,60 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+# Jupyter in Docker
+
+For details on getting started with Jupyter in Docker,
+see [Docker for Jupyter Notebook 
tutorials](../../../../docs/tutorials/tutorial-jupyter-docker.md).
+
+## Contributing
+
+### Rebuild Jupyter image
+
+You may want to update the Jupyter image to access new or updated tutorial 
notebooks,
+include new Python packages, or update configuration files.
+
+To build the custom Jupyter image locally:
+
+1. Clone the Druid repo if you haven't already.
+2. Navigate to `examples/quickstart/jupyter-notebooks` in your Druid source 
repo.
+3. Edit the image definition in `Dockerfile`.
+4. Navigate to the `docker-jupyter` directory.
+5. Generate the new build using the following command:
+
+   ```shell
+   DRUID_VERSION=25.0.0 docker compose --profile all-services -f 
docker-compose-local.yaml up -d --build
+   ```
+
+   You can change the value of `DRUID_VERSION` or the profile used from the 
Docker Compose file.
+
+### Update Docker Compose
+
+The Docker Compose file defines a multi-container application that allows you 
to run
+the custom Jupyter Notebook container, Apache Druid, and Apache Kafka.
+
+Any changes to `docker-compose.yaml` should also be made to 
`docker-compose-local.yaml`
+and vice versa. These files should be identical except that 
`docker-compose.yaml`
+contains an `image` attribute while `docker-compose-local.yaml` contains a 
`build` subsection.
+
+If you update `docker-compose.yaml`, recreate the ZIP file using the following 
command:
+
+```bash
+zip tutorial-jupyter-docker.zip docker-compose.yaml environment
+```
+
diff --git 
a/examples/quickstart/jupyter-notebooks/docker-jupyter/docker-compose-local.yaml
 
b/examples/quickstart/jupyter-notebooks/docker-jupyter/docker-compose-local.yaml
new file mode 100644
index 0000000000..9fb241deb8
--- /dev/null
+++ 
b/examples/quickstart/jupyter-notebooks/docker-jupyter/docker-compose-local.yaml
@@ -0,0 +1,172 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+---
+version: "2.2"
+
+volumes:
+  metadata_data: {}
+  middle_var: {}
+  historical_var: {}
+  broker_var: {}
+  coordinator_var: {}
+  router_var: {}
+  druid_shared: {}
+
+
+services:
+  postgres:
+    image: postgres:latest
+    container_name: postgres
+    profiles: ["druid-jupyter", "all-services"]
+    volumes:
+      - metadata_data:/var/lib/postgresql/data
+    environment:
+      - POSTGRES_PASSWORD=FoolishPassword
+      - POSTGRES_USER=druid
+      - POSTGRES_DB=druid
+
+  # Need 3.5 or later for container nodes
+  zookeeper:
+    image: zookeeper:latest
+    container_name: zookeeper
+    profiles: ["druid-jupyter", "all-services"]
+    ports:
+      - "2181:2181"
+    environment:
+      - ZOO_MY_ID=1
+      - ALLOW_ANONYMOUS_LOGIN=yes
+  
+  kafka:
+    image: bitnami/kafka:latest
+    container_name: kafka-broker
+    profiles: ["all-services"]
+    ports:
+    # To learn about configuring Kafka for access across networks see
+    # 
https://www.confluent.io/blog/kafka-client-cannot-connect-to-broker-on-aws-on-docker-etc/
+      - "9092:9092"
+    depends_on:
+      - zookeeper
+    environment:
+      - KAFKA_BROKER_ID=1
+      - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092
+      - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092
+      - KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
+      - ALLOW_PLAINTEXT_LISTENER=yes
+
+  coordinator:
+    image: apache/druid:${DRUID_VERSION}
+    container_name: coordinator
+    profiles: ["druid-jupyter", "all-services"]
+    volumes:
+      - druid_shared:/opt/shared
+      - coordinator_var:/opt/druid/var
+    depends_on: 
+      - zookeeper
+      - postgres
+    ports:
+      - "8081:8081"
+    command:
+      - coordinator
+    env_file:
+      - environment
+
+  broker:
+    image: apache/druid:${DRUID_VERSION}
+    container_name: broker
+    profiles: ["druid-jupyter", "all-services"]
+    volumes:
+      - broker_var:/opt/druid/var
+    depends_on: 
+      - zookeeper
+      - postgres
+      - coordinator
+    ports:
+      - "8082:8082"
+    command:
+      - broker
+    env_file:
+      - environment
+
+  historical:
+    image: apache/druid:${DRUID_VERSION}
+    container_name: historical
+    profiles: ["druid-jupyter", "all-services"]
+    volumes:
+      - druid_shared:/opt/shared
+      - historical_var:/opt/druid/var
+    depends_on: 
+      - zookeeper
+      - postgres
+      - coordinator
+    ports:
+      - "8083:8083"
+    command:
+      - historical
+    env_file:
+      - environment
+
+  middlemanager:
+    image: apache/druid:${DRUID_VERSION}
+    container_name: middlemanager
+    profiles: ["druid-jupyter", "all-services"]
+    volumes:
+      - druid_shared:/opt/shared
+      - middle_var:/opt/druid/var
+    depends_on: 
+      - zookeeper
+      - postgres
+      - coordinator
+    ports:
+      - "8091:8091"
+      - "8100-8105:8100-8105"
+    command:
+      - middleManager
+    env_file:
+      - environment
+
+  router:
+    image: apache/druid:${DRUID_VERSION}
+    container_name: router
+    profiles: ["druid-jupyter", "all-services"]
+    volumes:
+      - router_var:/opt/druid/var
+    depends_on:
+      - zookeeper
+      - postgres
+      - coordinator
+    ports:
+      - "8888:8888"
+    command:
+      - router
+    env_file:
+      - environment
+
+  jupyter:
+    build:
+      context: ..
+      dockerfile: Dockerfile
+    container_name: jupyter
+    profiles: ["jupyter", "all-services"]
+    environment:
+      DOCKER_STACKS_JUPYTER_CMD: "notebook"
+      NOTEBOOK_ARGS: "--NotebookApp.token=''"
+    ports:
+      - "${JUPYTER_PORT:-8889}:8888"
+    volumes:
+      - ./notebooks:/home/jovyan/work
diff --git 
a/examples/quickstart/jupyter-notebooks/docker-jupyter/docker-compose.yaml 
b/examples/quickstart/jupyter-notebooks/docker-jupyter/docker-compose.yaml
new file mode 100644
index 0000000000..d9e95c085b
--- /dev/null
+++ b/examples/quickstart/jupyter-notebooks/docker-jupyter/docker-compose.yaml
@@ -0,0 +1,170 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+---
+version: "2.2"
+
+volumes:
+  metadata_data: {}
+  middle_var: {}
+  historical_var: {}
+  broker_var: {}
+  coordinator_var: {}
+  router_var: {}
+  druid_shared: {}
+
+
+services:
+  postgres:
+    image: postgres:latest
+    container_name: postgres
+    profiles: ["druid-jupyter", "all-services"]
+    volumes:
+      - metadata_data:/var/lib/postgresql/data
+    environment:
+      - POSTGRES_PASSWORD=FoolishPassword
+      - POSTGRES_USER=druid
+      - POSTGRES_DB=druid
+
+  # Need 3.5 or later for container nodes
+  zookeeper:
+    image: zookeeper:latest
+    container_name: zookeeper
+    profiles: ["druid-jupyter", "all-services"]
+    ports:
+      - "2181:2181"
+    environment:
+      - ZOO_MY_ID=1
+      - ALLOW_ANONYMOUS_LOGIN=yes
+  
+  kafka:
+    image: bitnami/kafka:latest
+    container_name: kafka-broker
+    profiles: ["all-services"]
+    ports:
+    # To learn about configuring Kafka for access across networks see
+    # 
https://www.confluent.io/blog/kafka-client-cannot-connect-to-broker-on-aws-on-docker-etc/
+      - "9092:9092"
+    depends_on:
+      - zookeeper
+    environment:
+      - KAFKA_BROKER_ID=1
+      - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092
+      - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092
+      - KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
+      - ALLOW_PLAINTEXT_LISTENER=yes
+
+  coordinator:
+    image: apache/druid:${DRUID_VERSION}
+    container_name: coordinator
+    profiles: ["druid-jupyter", "all-services"]
+    volumes:
+      - druid_shared:/opt/shared
+      - coordinator_var:/opt/druid/var
+    depends_on: 
+      - zookeeper
+      - postgres
+    ports:
+      - "8081:8081"
+    command:
+      - coordinator
+    env_file:
+      - environment
+
+  broker:
+    image: apache/druid:${DRUID_VERSION}
+    container_name: broker
+    profiles: ["druid-jupyter", "all-services"]
+    volumes:
+      - broker_var:/opt/druid/var
+    depends_on: 
+      - zookeeper
+      - postgres
+      - coordinator
+    ports:
+      - "8082:8082"
+    command:
+      - broker
+    env_file:
+      - environment
+
+  historical:
+    image: apache/druid:${DRUID_VERSION}
+    container_name: historical
+    profiles: ["druid-jupyter", "all-services"]
+    volumes:
+      - druid_shared:/opt/shared
+      - historical_var:/opt/druid/var
+    depends_on: 
+      - zookeeper
+      - postgres
+      - coordinator
+    ports:
+      - "8083:8083"
+    command:
+      - historical
+    env_file:
+      - environment
+
+  middlemanager:
+    image: apache/druid:${DRUID_VERSION}
+    container_name: middlemanager
+    profiles: ["druid-jupyter", "all-services"]
+    volumes:
+      - druid_shared:/opt/shared
+      - middle_var:/opt/druid/var
+    depends_on: 
+      - zookeeper
+      - postgres
+      - coordinator
+    ports:
+      - "8091:8091"
+      - "8100-8105:8100-8105"
+    command:
+      - middleManager
+    env_file:
+      - environment
+
+  router:
+    image: apache/druid:${DRUID_VERSION}
+    container_name: router
+    profiles: ["druid-jupyter", "all-services"]
+    volumes:
+      - router_var:/opt/druid/var
+    depends_on:
+      - zookeeper
+      - postgres
+      - coordinator
+    ports:
+      - "8888:8888"
+    command:
+      - router
+    env_file:
+      - environment
+
+  jupyter:
+    image: imply/druid-notebook:latest
+    container_name: jupyter
+    profiles: ["jupyter", "all-services"]
+    environment:
+      DOCKER_STACKS_JUPYTER_CMD: "notebook"
+      NOTEBOOK_ARGS: "--NotebookApp.token=''"
+    ports:
+      - "${JUPYTER_PORT:-8889}:8888"
+    volumes:
+      - ./notebooks:/home/jovyan/work
diff --git a/examples/quickstart/jupyter-notebooks/docker-jupyter/environment 
b/examples/quickstart/jupyter-notebooks/docker-jupyter/environment
new file mode 100644
index 0000000000..c63a5c0e88
--- /dev/null
+++ b/examples/quickstart/jupyter-notebooks/docker-jupyter/environment
@@ -0,0 +1,56 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+# Java tuning
+#DRUID_XMX=1g
+#DRUID_XMS=1g
+#DRUID_MAXNEWSIZE=250m
+#DRUID_NEWSIZE=250m
+#DRUID_MAXDIRECTMEMORYSIZE=6172m
+DRUID_SINGLE_NODE_CONF=micro-quickstart
+
+druid_emitter_logging_logLevel=debug
+
+druid_extensions_loadList=["druid-histogram", "druid-datasketches", 
"druid-lookups-cached-global", "postgresql-metadata-storage", 
"druid-multi-stage-query", "druid-kafka-indexing-service"]
+
+druid_zk_service_host=zookeeper
+
+druid_metadata_storage_host=
+druid_metadata_storage_type=postgresql
+druid_metadata_storage_connector_connectURI=jdbc:postgresql://postgres:5432/druid
+druid_metadata_storage_connector_user=druid
+druid_metadata_storage_connector_password=FoolishPassword
+
+druid_coordinator_balancer_strategy=cachingCost
+
+druid_indexer_runner_javaOptsArray=["-server", "-Xmx1g", "-Xms1g", 
"-XX:MaxDirectMemorySize=3g", "-Duser.timezone=UTC", "-Dfile.encoding=UTF-8", 
"-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"]
+druid_indexer_fork_property_druid_processing_buffer_sizeBytes=256MiB
+
+
+
+druid_storage_type=local
+druid_storage_storageDirectory=/opt/shared/segments
+druid_indexer_logs_type=file
+druid_indexer_logs_directory=/opt/shared/indexing-logs
+
+druid_processing_numThreads=2
+druid_processing_numMergeBuffers=2
+
+DRUID_LOG4J=<?xml version="1.0" encoding="UTF-8" ?><Configuration 
status="WARN"><Appenders><Console name="Console" 
target="SYSTEM_OUT"><PatternLayout pattern="%d{ISO8601} %p [%t] %c - 
%m%n"/></Console></Appenders><Loggers><Root level="info"><AppenderRef 
ref="Console"/></Root><Logger name="org.apache.druid.jetty.RequestLog" 
additivity="false" level="DEBUG"><AppenderRef 
ref="Console"/></Logger></Loggers></Configuration>
+
diff --git 
a/examples/quickstart/jupyter-notebooks/docker-jupyter/kafka_docker_config.json 
b/examples/quickstart/jupyter-notebooks/docker-jupyter/kafka_docker_config.json
new file mode 100644
index 0000000000..2add8f3fa1
--- /dev/null
+++ 
b/examples/quickstart/jupyter-notebooks/docker-jupyter/kafka_docker_config.json
@@ -0,0 +1,90 @@
+{
+  "target": {
+    "type": "kafka",
+    "endpoint": "kafka:9092",
+    "topic": "social_media"
+  },
+  "emitters": [
+    {
+      "name": "example_record_1",
+      "dimensions": [
+        {
+          "type": "enum",
+          "name": "username",
+          "values": ["willow", "mia", "leon", "milton", "miette", "gus", 
"jojo", "rocket"],
+          "cardinality_distribution": {
+            "type": "uniform",
+            "min": 0,
+            "max": 7
+          }
+        },
+        {
+          "type": "string",
+          "name": "post_title",
+          "length_distribution": {"type": "uniform", "min": 1, "max": 140},
+          "cardinality": 0,
+          "chars": 
"abcdefghijklmnopqrstuvwxyz0123456789_ABCDEFGHIJKLMNOPQRSTUVWXYZ!';:,."
+        },
+        {
+          "type": "int",
+          "name": "views",
+          "distribution": {
+            "type": "exponential",
+            "mean": 10000
+          },
+          "cardinality": 0
+        },
+        {
+          "type": "int",
+          "name": "upvotes",
+          "distribution": {
+            "type": "normal",
+            "mean": 70,
+            "stddev": 20
+          },
+          "cardinality": 0
+        },
+        {
+          "type": "int",
+          "name": "comments",
+          "distribution": {
+            "type": "normal",
+            "mean": 10,
+            "stddev": 5
+          },
+          "cardinality": 0
+        },
+        {
+          "type": "enum",
+          "name": "edited",
+          "values": ["True","False"],
+          "cardinality_distribution": {
+            "type": "uniform",
+            "min": 0,
+            "max": 1
+          }
+        }
+      ]
+    }
+  ],
+  "interarrival": {
+    "type": "constant",
+    "value": 1
+  },
+  "states": [
+    {
+      "name": "state_1",
+      "emitter": "example_record_1",
+      "delay": {
+        "type": "constant",
+        "value": 1
+      },
+      "transitions": [
+        {
+          "next": "state_1",
+          "probability": 1.0
+        }
+      ]
+    }
+  ]
+}
diff --git 
a/examples/quickstart/jupyter-notebooks/docker-jupyter/tutorial-jupyter-docker.zip
 
b/examples/quickstart/jupyter-notebooks/docker-jupyter/tutorial-jupyter-docker.zip
new file mode 100644
index 0000000000..4a3c02e4c4
Binary files /dev/null and 
b/examples/quickstart/jupyter-notebooks/docker-jupyter/tutorial-jupyter-docker.zip
 differ
diff --git a/examples/quickstart/jupyter-notebooks/kafka-tutorial.ipynb 
b/examples/quickstart/jupyter-notebooks/kafka-tutorial.ipynb
new file mode 100644
index 0000000000..9ab6ce1681
--- /dev/null
+++ b/examples/quickstart/jupyter-notebooks/kafka-tutorial.ipynb
@@ -0,0 +1,782 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Tutorial: Ingest and query data from Apache Kafka\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "This tutorial introduces you to streaming ingestion in Apache Druid using 
the Apache Kafka event streaming platform.\n",
+    "Follow along to learn how to create and load data into a Kafka topic, 
start ingesting data from the topic into Druid, and query results over time. 
This tutorial assumes you have a basic understanding of Druid ingestion, 
querying, and API requests."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Table of contents\n",
+    "\n",
+    "* [Prerequisites](#Prerequisites)\n",
+    "* [Load Druid API client](#Load-Druid-API-client)\n",
+    "* [Create Kafka topic](#Create-Kafka-topic)\n",
+    "* [Load data into Kafka topic](#Load-data-into-Kafka-topic)\n",
+    "* [Start Druid ingestion](#Start-Druid-ingestion)\n",
+    "* [Query Druid datasource and visualize query 
results](#Query-Druid-datasource-and-visualize-query-results)\n",
+    "* [Learn more](#Learn-more)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Launch this tutorial and all prerequisites using the `all-services` 
profile of the Docker Compose file for Jupyter-based Druid tutorials. For more 
information, see [Docker for Jupyter Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "\n",
+    "Otherwise, you need the following:\n",
+    "* A running Druid instance.\n",
+    "   * Update the `druid_host` variable to point to your Router endpoint. 
For example, `druid_host = \"http://localhost:8888\"`.\n";,
+    "* A running Kafka cluster.\n",
+    "   * Update the Kafka bootstrap servers to point to your servers. For 
example, `bootstrap_servers=[\"localhost:9092\"]`.\n",
+    "* The following Python packages:\n",
+    "   * `druidapi`, a Python client for Apache Druid\n",
+    "   * `DruidDataDriver`, a data generator\n",
+    "   * `kafka`, a Python client for Apache Kafka\n",
+    "   * `pandas`, `matplotlib`, and `seaborn` for data visualization\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Load Druid API client"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To start the tutorial, run the following cell. It imports the required 
Python packages and defines a variable for the Druid client, and another for 
the SQL client used to run SQL commands."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "\n",
+       "<style>\n",
+       "  .druid table {\n",
+       "    border: 1px solid black;\n",
+       "    border-collapse: collapse;\n",
+       "  }\n",
+       "\n",
+       "  .druid th, .druid td {\n",
+       "    padding: 4px 1em ;\n",
+       "    text-align: left;\n",
+       "  }\n",
+       "\n",
+       "  td.druid-right, th.druid-right {\n",
+       "    text-align: right;\n",
+       "  }\n",
+       "\n",
+       "  td.druid-center, th.druid-center {\n",
+       "    text-align: center;\n",
+       "  }\n",
+       "\n",
+       "  .druid .druid-left {\n",
+       "    text-align: left;\n",
+       "  }\n",
+       "\n",
+       "  .druid-alert {\n",
+       "    font-weight: bold;\n",
+       "  }\n",
+       "\n",
+       "  .druid-error {\n",
+       "    color: red;\n",
+       "  }\n",
+       "</style>\n"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import druidapi\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In a distributed environment, you can point to other Druid services.\n",
+    "# In this tutorial, you'll use the Router service as the `druid_host`.\n",
+    "druid_host = \"http://router:8888\"\n";,
+    "\n",
+    "druid = druidapi.jupyter_client(druid_host)\n",
+    "display = druid.display\n",
+    "sql_client = druid.sql\n",
+    "\n",
+    "# Create a rest client for native JSON ingestion for streaming data\n",
+    "rest_client = druidapi.rest.DruidRestClient(\"http://coordinator:8081\";)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Create Kafka topic"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This notebook relies on the Python client for the Apache Kafka. Import 
the Kafka producer and consumer modules, then create a Kafka client. You use 
the Kafka producer to create and publish records to a new topic named 
`social_media`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from kafka import KafkaProducer\n",
+    "from kafka import KafkaConsumer\n",
+    "\n",
+    "# Kafka runs on kafka:9092 in multi-container tutorial application\n",
+    "producer = KafkaProducer(bootstrap_servers='kafka:9092')\n",
+    "topic_name = \"social_media\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Create the `social_media` topic and send a sample event. The `send()` 
command returns a metadata descriptor for the record."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "<kafka.producer.future.FutureRecordMetadata at 0x7f5f65344610>"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "event = {\n",
+    "    \"__time\": \"2023-01-03T16:40:21.501\",\n",
+    "    \"username\": \"willow\",\n",
+    "    \"post_title\": \"This title is required\",\n",
+    "    \"views\": 15284,\n",
+    "    \"upvotes\": 124,\n",
+    "    \"comments\": 21,\n",
+    "    \"edited\": \"True\"\n",
+    "}\n",
+    "\n",
+    "producer.send(topic_name, json.dumps(event).encode('utf-8'))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To verify that the Kafka topic stored the event, create a consumer client 
to read records from the Kafka cluster, and get the next (only) message:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{\"__time\": \"2023-01-03T16:40:21.501\", \"username\": \"willow\", 
\"post_title\": \"This title is required\", \"views\": 15284, \"upvotes\": 124, 
\"comments\": 21, \"edited\": \"True\"}\n"
+     ]
+    }
+   ],
+   "source": [
+    "consumer = KafkaConsumer(topic_name, bootstrap_servers=['kafka:9092'], 
auto_offset_reset='earliest',\n",
+    "     enable_auto_commit=True)\n",
+    "\n",
+    "print(next(consumer).value.decode('utf-8'))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Load data into Kafka topic"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Instead of manually creating events to send to the Kafka topic, use a 
data generator to simulate a continuous data stream. This tutorial makes use of 
Druid Data Driver to simulate a continuous data stream into the `social_media` 
Kafka topic. To learn more about the Druid Data Driver, see the Druid Summit 
talk, [Generating Time centric Data for Apache 
Druid](https://www.youtube.com/watch?v=3zAOeLe3iAo).\n",
+    "\n",
+    "In this notebook, you use a background process to continuously load data 
into the Kafka topic.\n",
+    "This allows you to keep executing commands in this notebook while data is 
constantly being streamed into the topic."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Run the following cells to load sample data into the `social_media` Kafka 
topic:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import multiprocessing as mp\n",
+    "from datetime import datetime\n",
+    "import DruidDataDriver"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def run_driver():\n",
+    "    DruidDataDriver.simulate(\"kafka_docker_config.json\", None, None, 
\"REAL\", datetime.now())\n",
+    "        \n",
+    "mp.set_start_method('fork')\n",
+    "ps = mp.Process(target=run_driver)\n",
+    "ps.start()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Start Druid ingestion"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now that you have a new Kafka topic and data being streamed into the 
topic, you ingest the data into Druid by submitting a Kafka ingestion spec.\n",
+    "The ingestion spec describes the following:\n",
+    "* where to source the data to ingest (in `spec > ioConfig`),\n",
+    "* the datasource to ingest data into (in `spec > dataSchema > 
dataSource`), and\n",
+    "* what the data looks like (in `spec > dataSchema > dimensionsSpec`).\n",
+    "\n",
+    "Other properties control how Druid aggregates and stores data. For more 
information, see the Druid documenation:\n",
+    "* [Apache Kafka 
ingestion](https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html)\n",
+    "* [Ingestion spec 
reference](https://druid.apache.org/docs/latest/ingestion/ingestion-spec.html)\n",
+    "\n",
+    "Run the following cells to define and view the Kafka ingestion spec."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "kafka_ingestion_spec = \"{\\\"type\\\": \\\"kafka\\\",\\\"spec\\\": 
{\\\"ioConfig\\\": {\\\"type\\\": \\\"kafka\\\",\\\"consumerProperties\\\": 
{\\\"bootstrap.servers\\\": \\\"kafka:9092\\\"},\\\"topic\\\": 
\\\"social_media\\\",\\\"inputFormat\\\": {\\\"type\\\": 
\\\"json\\\"},\\\"useEarliestOffset\\\": true},\\\"tuningConfig\\\": 
{\\\"type\\\": \\\"kafka\\\"},\\\"dataSchema\\\": {\\\"dataSource\\\": 
\\\"social_media\\\",\\\"timestampSpec\\\": {\\\"column\\\": 
\\\"__time\\\",\\\"for [...]
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{\n",
+      "    \"type\": \"kafka\",\n",
+      "    \"spec\": {\n",
+      "        \"ioConfig\": {\n",
+      "            \"type\": \"kafka\",\n",
+      "            \"consumerProperties\": {\n",
+      "                \"bootstrap.servers\": \"kafka:9092\"\n",
+      "            },\n",
+      "            \"topic\": \"social_media\",\n",
+      "            \"inputFormat\": {\n",
+      "                \"type\": \"json\"\n",
+      "            },\n",
+      "            \"useEarliestOffset\": true\n",
+      "        },\n",
+      "        \"tuningConfig\": {\n",
+      "            \"type\": \"kafka\"\n",
+      "        },\n",
+      "        \"dataSchema\": {\n",
+      "            \"dataSource\": \"social_media\",\n",
+      "            \"timestampSpec\": {\n",
+      "                \"column\": \"__time\",\n",
+      "                \"format\": \"iso\"\n",
+      "            },\n",
+      "            \"dimensionsSpec\": {\n",
+      "                \"dimensions\": [\n",
+      "                    \"username\",\n",
+      "                    \"post_title\",\n",
+      "                    {\n",
+      "                        \"type\": \"long\",\n",
+      "                        \"name\": \"views\"\n",
+      "                    },\n",
+      "                    {\n",
+      "                        \"type\": \"long\",\n",
+      "                        \"name\": \"upvotes\"\n",
+      "                    },\n",
+      "                    {\n",
+      "                        \"type\": \"long\",\n",
+      "                        \"name\": \"comments\"\n",
+      "                    },\n",
+      "                    \"edited\"\n",
+      "                ]\n",
+      "            },\n",
+      "            \"granularitySpec\": {\n",
+      "                \"queryGranularity\": \"none\",\n",
+      "                \"rollup\": false,\n",
+      "                \"segmentGranularity\": \"hour\"\n",
+      "            }\n",
+      "        }\n",
+      "    }\n",
+      "}\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(json.dumps(json.loads(kafka_ingestion_spec), indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Send the spec to Druid to start the streaming ingestion from Kafka:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "<Response [200]>"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "headers = {\n",
+    "  'Content-Type': 'application/json'\n",
+    "}\n",
+    "\n",
+    "rest_client.post(\"/druid/indexer/v1/supervisor\", kafka_ingestion_spec, 
headers=headers)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "A `200` response indicates that the request was successful. You can view 
the running ingestion task and the new datasource in the web console at 
http://localhost:8888/unified-console.html.";
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Query Druid datasource and visualize query results"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can now query the new datasource called `social_media`. In this 
section, you also visualize query results using the Matplotlib and Seaborn 
visualization libraries. Run the following cell import these packages."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import matplotlib\n",
+    "import matplotlib.pyplot as plt\n",
+    "import seaborn as sns"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Run a simple query to view a subset of rows from the new datasource:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div class=\"druid\"><table>\n",
+       
"<tr><th>__time</th><th>username</th><th>post_title</th><th>views</th><th>upvotes</th><th>comments</th><th>edited</th></tr>\n",
+       "<tr><td>2023-01-03T16:40:21.501Z</td><td>willow</td><td>This title is 
required</td><td>15284</td><td>124</td><td>21</td><td>True</td></tr>\n",
+       
"<tr><td>2023-05-02T23:34:54.451Z</td><td>gus</td><td>3y4hkmd1!&#x27;Er4;</td><td>4031</td><td>93</td><td>15</td><td>False</td></tr>\n",
+       
"<tr><td>2023-05-02T23:34:55.454Z</td><td>mia</td><td>m62u53:D9s2bOvnY_VM9vjtZ&#x27;MyDLvQ7_xGodAP:ZNTXM6cFAt,_jrxBVBeRILLvAF9Z!jM9YNN;3ErV5eGbE_TFQS</td><td>16060</td><td>84</td><td>8</td><td>True</td></tr>\n",
+       
"<tr><td>2023-05-02T23:34:55.455Z</td><td>jojo</td><td>rAmeAJrjs;FBj:zy2MwoGh_P_SowlLTfp6zhX55xqogH.,1DC2xY_x2T;M_Vcu3QWaz650u;Roa</td><td>14313</td><td>65</td><td>7</td><td>False</td></tr>\n",
+       
"<tr><td>2023-05-02T23:34:56.456Z</td><td>willow</td><td>3bHB,iJdE;sedTDA,1dKGDAZL!qdsvO_tv.4Jrq7fa.KWcHPD&#x27;TB_5nnvsf9EgtnN8tGeeA0MjKc30iubJ:D&#x27;l7pHNihWpFz8K&#x27;46q!vJs</td><td>4237</td><td>112</td><td>3</td><td>True</td></tr>\n",
+       "</table></div>"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "sql = '''\n",
+    "SELECT * FROM social_media LIMIT 5\n",
+    "'''\n",
+    "display.sql(sql)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this social media scenario, each incoming event represents a post on 
social media, for which you collect the timestamp, username, and post metadata. 
You are interested in analyzing the total number of upvotes for all posts, 
compared between users. Preview this data with the following query:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div class=\"druid\"><table>\n",
+       "<tr><th>num_posts</th><th>total_upvotes</th><th>username</th></tr>\n",
+       "<tr><td>155</td><td>10985</td><td>willow</td></tr>\n",
+       "<tr><td>161</td><td>11223</td><td>gus</td></tr>\n",
+       "<tr><td>164</td><td>11456</td><td>leon</td></tr>\n",
+       "<tr><td>173</td><td>12098</td><td>jojo</td></tr>\n",
+       "<tr><td>176</td><td>12175</td><td>mia</td></tr>\n",
+       "<tr><td>177</td><td>11998</td><td>milton</td></tr>\n",
+       "<tr><td>185</td><td>13256</td><td>miette</td></tr>\n",
+       "<tr><td>188</td><td>13360</td><td>rocket</td></tr>\n",
+       "</table></div>"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "sql = '''\n",
+    "SELECT\n",
+    "  COUNT(post_title) as num_posts,\n",
+    "  SUM(upvotes) as total_upvotes,\n",
+    "  username\n",
+    "FROM social_media\n",
+    "GROUP BY username\n",
+    "ORDER BY num_posts\n",
+    "'''\n",
+    "\n",
+    "response = sql_client.sql_query(sql)\n",
+    "response.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Visualize the total number of upvotes per user using a line plot. You 
sort the results by username before plotting because the order of users may 
vary as new results arrive."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "image/png": 
"iVBORw0KGgoAAAANSUhEUgAAAk0AAAHMCAYAAADI/py4AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy88F64QAAAACXBIWXMAAA9hAAAPYQGoP6dpAACRN0lEQVR4nOzdd3iTZfcH8O+T7j3ppNAySwdtAQtF9p7KEEVl+JMhvjJERJYyVBRRnCjI60AcrwqyQaBskLJbSiktUArdmzZd6UjO7480sWE2Je2TpOdzXb20z/M0OUlpcnKf+z63QEQExhhjjDH2UBKxA2CMMcYYMwScNDHGGGOM1QEnTYwxxhhjdcBJE2OMMcZYHXDSxBhjjDFWB5w0McYYY4zVASdNjDHGGGN1YCp2AMZCoVAgIyMDdnZ2EARB7HAYY4wxVgdEhOLiYnh5eUEiefhYEidNOpKRkQEfHx+xw2CM
 [...]
+      "text/plain": [
+       "<Figure size 640x480 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "df = pd.DataFrame(response.json)\n",
+    "df = df.sort_values('username')\n",
+    "\n",
+    "df.plot(x='username', y='total_upvotes', marker='o')\n",
+    "plt.xticks(rotation=45, ha='right')\n",
+    "plt.ylabel(\"Total number of upvotes\")\n",
+    "plt.gca().get_legend().remove()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The total number of upvotes likely depends on the total number of posts 
created per user. To better assess the relative impact per user, you compare 
the total number of upvotes (line plot) with the total number of posts."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "<matplotlib.legend.Legend at 0x7f5f18400310>"
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
+    {
+     "data": {
+      "image/png": 
"iVBORw0KGgoAAAANSUhEUgAAA1cAAAHMCAYAAAA5/FJZAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy88F64QAAAACXBIWXMAAA9hAAAPYQGoP6dpAADE60lEQVR4nOzdd3hUdfb48feUTHrvgYSEFhJKQhGkCQSkKS6uFZDq6voVdDH2/UqxLLYFQWHhZ2ddXbEgX9dFpIgiSJESBQIBQiABUkmvk8zc3x9hBmICpExyJ8l5Pc88j5m5c++ZRJI5c87nfDSKoigIIYQQQgghhGgSrdoBCCGEEEIIIURbIMmVEEIIIYQQQtiAJFdCCCGEEEIIYQOSXAkhhBBCCCGEDUhyJYQQQgghhBA2IMmVEEIIIYQQQtiAJFdCCCGEEEIIYQN6tQNoK6qqqjh06BCBgYFotZKzCiGEEK2B
 [...]
+      "text/plain": [
+       "<Figure size 640x480 with 2 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "matplotlib.rc_file_defaults()\n",
+    "ax1 = sns.set_style(style=None, rc=None )\n",
+    "\n",
+    "fig, ax1 = plt.subplots()\n",
+    "plt.xticks(rotation=45, ha='right')\n",
+    "\n",
+    "\n",
+    "sns.lineplot(\n",
+    "    data=df, x='username', y='total_upvotes',\n",
+    "    marker='o', ax=ax1, label=\"Sum of upvotes\")\n",
+    "ax1.get_legend().remove()\n",
+    "\n",
+    "ax2 = ax1.twinx()\n",
+    "sns.barplot(data=df, x='username', y='num_posts',\n",
+    "            order=df['username'], alpha=0.5, ax=ax2, log=True,\n",
+    "            color=\"orange\", label=\"Number of posts\")\n",
+    "\n",
+    "\n",
+    "# ask matplotlib for the plotted objects and their labels\n",
+    "lines, labels = ax1.get_legend_handles_labels()\n",
+    "lines2, labels2 = ax2.get_legend_handles_labels()\n",
+    "ax2.legend(lines + lines2, labels + labels2, bbox_to_anchor=(1.55, 1))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You should see a correlation between total number of upvotes and total 
number of posts. In order to track user impact on a more equal footing, 
normalize the total number of upvotes relative to the total number of posts, 
and plot the result:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "image/png": 
"iVBORw0KGgoAAAANSUhEUgAAAkAAAAHMCAYAAAA9ABcIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy88F64QAAAACXBIWXMAAA9hAAAPYQGoP6dpAACLeElEQVR4nO3dd1xT5/cH8E/YIFNkyhQRRFFQHDjqVlyto1ZbrVpXbbXO2or9ujocbW3VDm1t1VpbW2frqHvvCU5EBGSDiuxNcn5/8MstKaAEEi5Jzvv1yktzc3PvuQGSk+c5z/NIiIjAGGOMMaZD9MQOgDHGGGOsrnECxBhjjDGdwwkQY4wxxnQOJ0CMMcYY0zmcADHGGGNM53ACxBhjjDGdwwkQY4wxxnSOgdgB1EcymQzJycmwsLCARCIROxzGGGOMVQMRIScnB87OztDTe34bDydAlUhOToarq6vYYTDGGGOs
 [...]
+      "text/plain": [
+       "<Figure size 640x480 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "df['upvotes_normalized'] = df['total_upvotes']/df['num_posts']\n",
+    "\n",
+    "df.plot(x='username', y='upvotes_normalized', marker='o', 
color='green')\n",
+    "plt.xticks(rotation=45, ha='right')\n",
+    "plt.ylabel(\"Number of upvotes (normalized)\")\n",
+    "plt.gca().get_legend().remove()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You've been working with data taken at a single snapshot in time from 
when you ran the last query. Run the same query again, and store the output in 
`response2`, which you will compare with the previous results:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div class=\"druid\"><table>\n",
+       "<tr><th>num_posts</th><th>total_upvotes</th><th>username</th></tr>\n",
+       "<tr><td>404</td><td>28166</td><td>willow</td></tr>\n",
+       "<tr><td>418</td><td>29413</td><td>jojo</td></tr>\n",
+       "<tr><td>419</td><td>29202</td><td>mia</td></tr>\n",
+       "<tr><td>419</td><td>29456</td><td>miette</td></tr>\n",
+       "<tr><td>428</td><td>29472</td><td>gus</td></tr>\n",
+       "<tr><td>433</td><td>30160</td><td>milton</td></tr>\n",
+       "<tr><td>440</td><td>31212</td><td>leon</td></tr>\n",
+       "<tr><td>443</td><td>31063</td><td>rocket</td></tr>\n",
+       "</table></div>"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "response2 = sql_client.sql_query(sql)\n",
+    "response2.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Normalizing the data also helps you evaluate trends over time more 
consistently on the same plot axes. Plot the normalized data again, this time 
alongside the results from the previous snapshot:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "image/png": 
"iVBORw0KGgoAAAANSUhEUgAAAkAAAAHMCAYAAAA9ABcIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy88F64QAAAACXBIWXMAAA9hAAAPYQGoP6dpAAC6DklEQVR4nOzdd3iTZffA8W+SbroodNKWsilQoOxV9ihLkCUqAoKoiALixNetPxDfVwW3Iut9FZSpAlKWjLJX2WWVQgdtoYXuneT3R22ktkDTJk3Sns915ZI+ffI8J0ibk/s+97kVWq1WixBCCCFEDaI0dQBCCCGEEFVNEiAhhBBC1DiSAAkhhBCixpEESAghhBA1jiRAQgghhKhxJAESQgghRI0jCZAQQgghahwrUwdgjjQaDTdu3MDJyQmFQmHqcIQQQghRDlqtloyMDHx8fFAq7z/GIwlQGW7cuIGfn5+pwxBC
 [...]
+      "text/plain": [
+       "<Figure size 640x480 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "df2 = pd.DataFrame(response2.json)\n",
+    "df2 = df2.sort_values('username')\n",
+    "df2['upvotes_normalized'] = df2['total_upvotes']/df2['num_posts']\n",
+    "\n",
+    "ax = df.plot(x='username', y='upvotes_normalized', marker='o', 
color='green', label=\"Time 1\")\n",
+    "df2.plot(x='username', y='upvotes_normalized', marker='o', 
color='purple', ax=ax, label=\"Time 2\")\n",
+    "plt.xticks(rotation=45, ha='right')\n",
+    "plt.ylabel(\"Number of upvotes (normalized)\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This plot shows how some users maintain relatively consistent social 
media impact between the two query snapshots, whereas other users grow or 
decline in their influence.\n",
+    "\n",
+    "## Learn more\n",
+    "\n",
+    "This tutorial showed you how to create a Kafka topic using a Python 
client for Kafka, send a simulated stream of data to Kafka using a data 
generator, and query and visualize results over time. For more information, see 
the following resources:\n",
+    "\n",
+    "* [Apache Kafka 
ingestion](https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html)\n",
+    "* [Querying 
data](https://druid.apache.org/docs/latest/tutorials/tutorial-query.html)\n",
+    "* [Tutorial: Run with 
Docker](https://druid.apache.org/docs/latest/tutorials/docker.html)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.8"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "a4289e5b8bae5973a6609d90f7bc464162478362b9a770893a3c5c597b0b36e7"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/website/sidebars.json b/website/sidebars.json
index fbb6bf0866..f1ab145c04 100644
--- a/website/sidebars.json
+++ b/website/sidebars.json
@@ -27,6 +27,7 @@
       "tutorials/tutorial-sql-query-view",
       "tutorials/tutorial-unnest-arrays",
       "tutorials/tutorial-jupyter-index",
+      "tutorials/tutorial-jupyter-docker",
       "tutorials/tutorial-jdbc"
     ],
     "Design": [


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[druid] branch 26.0.0 updated: Docs: Tutorial for streaming ingestion using Kafka + Docker file to use with Jupyter tutorials (#13984) (#14289)

Reply via email to