This is an automated email from the ASF dual-hosted git repository.
vogievetsky pushed a commit to branch 26.0.0
in repository https://gitbox.apache.org/repos/asf/druid.git
The following commit(s) were added to refs/heads/26.0.0 by this push:
new 855e576e87 Docs: Tutorial for streaming ingestion using Kafka + Docker
file to use with Jupyter tutorials (#13984) (#14289)
855e576e87 is described below
commit 855e576e87096e08c4c85a7fc53584a7801e402b
Author: Victoria Lim <[email protected]>
AuthorDate: Mon May 22 14:29:37 2023 -0700
Docs: Tutorial for streaming ingestion using Kafka + Docker file to use
with Jupyter tutorials (#13984) (#14289)
---
.gitignore | 3 +-
docs/tutorials/tutorial-jupyter-docker.md | 201 ++++++
docs/tutorials/tutorial-jupyter-index.md | 67 +-
.../jupyter-notebooks/0-START-HERE.ipynb | 25 +-
examples/quickstart/jupyter-notebooks/Dockerfile | 65 ++
examples/quickstart/jupyter-notebooks/README.md | 74 +-
.../jupyter-notebooks/docker-jupyter/README.md | 60 ++
.../docker-jupyter/docker-compose-local.yaml | 172 +++++
.../docker-jupyter/docker-compose.yaml | 170 +++++
.../jupyter-notebooks/docker-jupyter/environment | 56 ++
.../docker-jupyter/kafka_docker_config.json | 90 +++
.../docker-jupyter/tutorial-jupyter-docker.zip | Bin 0 -> 2939 bytes
.../jupyter-notebooks/kafka-tutorial.ipynb | 782 +++++++++++++++++++++
website/sidebars.json | 1 +
14 files changed, 1635 insertions(+), 131 deletions(-)
diff --git a/.gitignore b/.gitignore
index 31b2f9dd1e..a60eb68173 100644
--- a/.gitignore
+++ b/.gitignore
@@ -33,9 +33,10 @@ integration-tests/gen-scripts/
**/.ipython/
**/.jupyter/
**/.local/
+**/druidapi.egg-info/
+examples/quickstart/jupyter-notebooks/docker-jupyter/notebooks
# ignore NetBeans IDE specific files
nbproject
nbactions.xml
nb-configuration.xml
-
diff --git a/docs/tutorials/tutorial-jupyter-docker.md
b/docs/tutorials/tutorial-jupyter-docker.md
new file mode 100644
index 0000000000..b5aa939db8
--- /dev/null
+++ b/docs/tutorials/tutorial-jupyter-docker.md
@@ -0,0 +1,201 @@
+---
+id: tutorial-jupyter-docker
+title: "Docker for Jupyter Notebook tutorials"
+sidebar_label: "Docker for tutorials"
+---
+
+<!--
+ ~ Licensed to the Apache Software Foundation (ASF) under one
+ ~ or more contributor license agreements. See the NOTICE file
+ ~ distributed with this work for additional information
+ ~ regarding copyright ownership. The ASF licenses this file
+ ~ to you under the Apache License, Version 2.0 (the
+ ~ "License"); you may not use this file except in compliance
+ ~ with the License. You may obtain a copy of the License at
+ ~
+ ~ http://www.apache.org/licenses/LICENSE-2.0
+ ~
+ ~ Unless required by applicable law or agreed to in writing,
+ ~ software distributed under the License is distributed on an
+ ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ ~ KIND, either express or implied. See the License for the
+ ~ specific language governing permissions and limitations
+ ~ under the License.
+ -->
+
+
+Apache Druid provides a custom Jupyter container that contains the
prerequisites
+for all Jupyter-based Druid tutorials, as well as all of the tutorials
themselves.
+You can run the Jupyter container, as well as containers for Druid and Apache
Kafka,
+using the Docker Compose file provided in the Druid GitHub repository.
+
+You can run the following combination of applications:
+* [Jupyter only](#start-only-the-jupyter-container)
+* [Jupyter and Druid](#start-jupyter-and-druid)
+* [Jupyter, Druid, and Kafka](#start-jupyter-druid-and-kafka)
+
+## Prerequisites
+
+Jupyter in Docker requires that you have **Docker** and **Docker Compose**.
+We recommend installing these through [Docker
Desktop](https://docs.docker.com/desktop/).
+
+## Launch the Docker containers
+
+You run Docker Compose to launch Jupyter and optionally Druid or Kafka.
+Docker Compose references the configuration in `docker-compose.yaml`.
+Running Druid in Docker also requires the `environment` file, which
+sets the configuration properties for the Druid services.
+To get started, download both `docker-compose.yaml` and `environment` from
+[`tutorial-jupyter-docker.zip`](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/docker-jupyter/tutorial-jupyter-docker.zip).
+
+Alternatively, you can clone the [Apache Druid
repo](https://github.com/apache/druid) and
+access the files in
`druid/examples/quickstart/jupyter-notebooks/docker-jupyter`.
+
+### Start only the Jupyter container
+
+If you already have Druid running locally, you can run only the Jupyter
container to complete the tutorials.
+In the same directory as `docker-compose.yaml`, start the application:
+
+```bash
+docker compose --profile jupyter up -d
+```
+
+The Docker Compose file assigns `8889` for the Jupyter port.
+You can override the port number by setting the `JUPYTER_PORT` environment
variable before starting the Docker application.
+
+### Start Jupyter and Druid
+
+Running Druid in Docker requires the `environment` file as well as an
environment variable named `DRUID_VERSION`,
+which determines the version of Druid to use. The Druid version references the
Docker tag to pull from the
+[Apache Druid Docker Hub](https://hub.docker.com/r/apache/druid/tags).
+
+In the same directory as `docker-compose.yaml` and `environment`, start the
application:
+
+```bash
+DRUID_VERSION={{DRUIDVERSION}} docker compose --profile druid-jupyter up -d
+```
+
+### Start Jupyter, Druid, and Kafka
+
+Running Druid in Docker requires the `environment` file as well as the
`DRUID_VERSION` environment variable.
+
+In the same directory as `docker-compose.yaml` and `environment`, start the
application:
+
+```bash
+DRUID_VERSION={{DRUIDVERSION}} docker compose --profile all-services up -d
+```
+
+### Update image from Docker Hub
+
+If you already have a local cache of the Jupyter image, you can update the
image before running the application using the following command:
+
+```bash
+docker compose pull jupyter
+```
+
+### Use locally built image
+
+The default Docker Compose file pulls the custom Jupyter Notebook image from a
third party Docker Hub.
+If you prefer to build the image locally from the official source, do the
following:
+1. Clone the Apache Druid repository.
+2. Navigate to `examples/quickstart/jupyter-notebooks/docker-jupyter`.
+3. Start the services using `-f docker-compose-local.yaml` in the `docker
compose` command. For example:
+
+```bash
+DRUID_VERSION={{DRUIDVERSION}} docker compose --profile all-services -f
docker-compose-local.yaml up -d
+```
+
+## Access Jupyter-based tutorials
+
+The following steps show you how to access the Jupyter notebook tutorials from
the Docker container.
+At startup, Docker creates and mounts a volume to persist data from the
container to your local machine.
+This way you can save your work completed within the Docker container.
+
+1. Navigate to the notebooks at http://localhost:8889.
+ > If you set `JUPYTER_PORT` to another port number, replace `8889` with the
value of the Jupyter port.
+
+2. Select a tutorial. If you don't plan to save your changes, you can use the
notebook directly as is. Otherwise, continue to the next step.
+
+3. Optional: To save a local copy of your tutorial work,
+select **File > Save as...** from the navigation menu. Then enter
`work/<notebook name>.ipynb`.
+If the notebook still displays as read only, you may need to refresh the page
in your browser.
+Access the saved files in the `notebooks` folder in your local working
directory.
+
+## View the Druid web console
+
+To access the Druid web console in Docker, go to
http://localhost:8888/unified-console.html.
+Use the web console to view datasources and ingestion tasks that you create in
the tutorials.
+
+## Stop Docker containers
+
+Shut down the Docker application using the following command:
+
+```bash
+docker compose down -v
+```
+
+## Tutorial setup without using Docker
+
+To use the Jupyter Notebook-based tutorials without using Docker, do the
following:
+
+1. Clone the Apache Druid repo, or download the
[tutorials](tutorial-jupyter-index.md#tutorials)
+as well as the [Python client for
Druid](tutorial-jupyter-index.md#python-api-for-druid).
+
+2. Install the prerequisite Python packages with the following commands:
+
+ ```bash
+ # Install requests
+ pip install requests
+ ```
+
+ ```bash
+ # Install JupyterLab
+ pip install jupyterlab
+
+ # Install Jupyter Notebook
+ pip install notebook
+ ```
+
+ Individual notebooks may list additional packages you need to install to
complete the tutorial.
+
+3. In your Druid source repo, install `druidapi` with the following commands:
+
+ ```bash
+ cd examples/quickstart/jupyter-notebooks/druidapi
+ pip install .
+ ```
+
+4. Start Jupyter, in the same directory as the tutorials, using either
JupyterLab or Jupyter Notebook:
+ ```bash
+ # Start JupyterLab on port 3001
+ jupyter lab --port 3001
+
+ # Start Jupyter Notebook on port 3001
+ jupyter notebook --port 3001
+ ```
+
+5. Start Druid. You can use the [Quickstart (local)](./index.md) instance. The
tutorials
+ assume that you are using the quickstart, so no authentication or
authorization
+ is expected unless explicitly mentioned.
+
+ If you contribute to Druid, and work with Druid integration tests, you can
use a test cluster.
+ Assume you have an environment variable, `DRUID_DEV`, which identifies your
Druid source repo.
+
+ ```bash
+ cd $DRUID_DEV
+ ./it.sh build
+ ./it.sh image
+ ./it.sh up <category>
+ ```
+
+ Replace `<category>` with one of the available integration test categories.
See the integration
+ test `README.md` for details.
+
+You should now be able to access and complete the tutorials.
+
+## Learn more
+
+See the following topics for more information:
+* [Jupyter Notebook tutorials](tutorial-jupyter-index.md) for the available
Jupyter Notebook-based tutorials for Druid
+* [Tutorial: Run with Docker](docker.md) for running Druid from a Docker
container
+
diff --git a/docs/tutorials/tutorial-jupyter-index.md
b/docs/tutorials/tutorial-jupyter-index.md
index d77e0d42b3..d7f401cae5 100644
--- a/docs/tutorials/tutorial-jupyter-index.md
+++ b/docs/tutorials/tutorial-jupyter-index.md
@@ -32,67 +32,34 @@ the Druid API to complete the tutorial.
## Prerequisites
-Make sure you meet the following requirements before starting the
Jupyter-based tutorials:
+The simplest way to get started is to use Docker. In this case, you only need
to set up Docker Desktop.
+For more information, see [Docker for Jupyter Notebook
tutorials](tutorial-jupyter-docker.md).
-- Python 3.7 or later
-
-- The `requests` package for Python. For example, you can install it with the
following command:
-
- ```bash
- pip3 install requests
- ```
-
-- JupyterLab (recommended) or Jupyter Notebook running on a non-default port.
By default, Druid
- and Jupyter both try to use port `8888`, so start Jupyter on a different
port.
-
-
- - Install JupyterLab or Notebook:
+Otherwise, you can install the prerequisites on your own. Here's what you need:
- ```bash
- # Install JupyterLab
- pip3 install jupyterlab
- # Install Jupyter Notebook
- pip3 install notebook
- ```
- - Start Jupyter using either JupyterLab
- ```bash
- # Start JupyterLab on port 3001
- jupyter lab --port 3001
- ```
-
- Or using Jupyter Notebook
- ```bash
- # Start Jupyter Notebook on port 3001
- jupyter notebook --port 3001
- ```
-
-- An available Druid instance. You can use the [Quickstart
(local)](./index.md) instance. The tutorials
- assume that you are using the quickstart, so no authentication or
authorization
- is expected unless explicitly mentioned.
-
- If you contribute to Druid, and work with Druid integration tests, can use a
test cluster.
- Assume you have an environment variable, `DRUID_DEV`, which identifies your
Druid source repo.
-
- ```bash
- cd $DRUID_DEV
- ./it.sh build
- ./it.sh image
- ./it.sh up <category>
- ```
+- An available Druid instance.
+- Python 3.7 or later
+- JupyterLab (recommended) or Jupyter Notebook running on a non-default port.
+By default, Druid and Jupyter both try to use port `8888`, so start Jupyter on
a different port.
+- The `requests` Python package
+- The `druidapi` Python package
- Replace `<category>` with one of the available integration test categories.
See the integration
- test `README.md` for details.
+For setup instructions, see [Tutorial setup without using
Docker](tutorial-jupyter-docker.md#tutorial-setup-without-using-docker).
+Individual tutorials may require additional Python packages, such as for
visualization or streaming ingestion.
-## Simple Druid API
+## Python API for Druid
+The `druidapi` Python package is a REST API for Druid.
One of the notebooks shows how to use the Druid REST API. The others focus on
other
topics and use a simple set of Python wrappers around the underlying REST API.
The
wrappers reside in the `druidapi` package within the notebooks directory.
While the package
can be used in any Python program, the key purpose, at present, is to support
these
-notebooks. See the [Introduction to the Druid Python API]
-(https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/python-api-tutorial.ipynb)
+notebooks. See
+[Introduction to the Druid Python
API](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/python-api-tutorial.ipynb)
for an overview of the Python API.
+The `druidapi` package is already installed in the custom Jupyter Docker
container for Druid tutorials.
+
## Tutorials
The notebooks are located in the [apache/druid
repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).
You can either clone the repo or download the notebooks you want individually.
diff --git a/examples/quickstart/jupyter-notebooks/0-START-HERE.ipynb
b/examples/quickstart/jupyter-notebooks/0-START-HERE.ipynb
index fe4a30a551..5e74fa71c1 100644
--- a/examples/quickstart/jupyter-notebooks/0-START-HERE.ipynb
+++ b/examples/quickstart/jupyter-notebooks/0-START-HERE.ipynb
@@ -41,24 +41,27 @@
"source": [
"## Prerequisites\n",
"\n",
- "To get this far, you've installed Python 3 and Jupyter Notebook. Make
sure you meet the following requirements before starting the Jupyter-based
tutorials:\n",
- "\n",
- "- The `requests` package for Python. For example, you can install it with
the following command:\n",
- "\n",
- " ```bash\n",
- " pip install requests\n",
- " ````\n",
- "\n",
- "- JupyterLab (recommended) or Jupyter Notebook running on a non-default
port. By default, Druid\n",
- " and Jupyter both try to use port `8888`, so start Jupyter on a
different port.\n",
+ "Before starting the Jupyter-based tutorials, make sure you meet the
requirements listed in this section.\n",
+ "The simplest way to get started is to use Docker. In this case, you only
need to set up Docker Desktop.\n",
+ "For more information, see [Docker for Jupyter Notebook
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
"\n",
+ "Otherwise, you need the following:\n",
"- An available Druid instance. You can use the local quickstart
configuration\n",
" described in
[Quickstart](https://druid.apache.org/docs/latest/tutorials/index.html).\n",
" The tutorials assume that you are using the quickstart, so no
authentication or authorization\n",
" is expected unless explicitly mentioned.\n",
+ "- Python 3.7 or later\n",
+ "- JupyterLab (recommended) or Jupyter Notebook running on a non-default
port. By default, Druid\n",
+ " and Jupyter both try to use port `8888`, so start Jupyter on a
different port.\n",
+ "- The `requests` Python package\n",
+ "- The `druidapi` Python package\n",
+ "\n",
+ "For setup instructions, see [Tutorial setup without using
Docker](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html#tutorial-setup-without-using-docker).\n",
+ "Individual tutorials may require additional Python packages, such as for
visualization or streaming ingestion.\n",
"\n",
"## Simple Druid API\n",
"\n",
+ "The `druidapi` Python package is a REST API for Druid.\n",
"One of the notebooks shows how to use the Druid REST API. The others
focus on other\n",
"topics and use a simple set of Python wrappers around the underlying REST
API. The\n",
"wrappers reside in the `druidapi` package within this directory. While
the package\n",
@@ -148,7 +151,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.9.6"
+ "version": "3.9.5"
}
},
"nbformat": 4,
diff --git a/examples/quickstart/jupyter-notebooks/Dockerfile
b/examples/quickstart/jupyter-notebooks/Dockerfile
new file mode 100644
index 0000000000..492a4da9c1
--- /dev/null
+++ b/examples/quickstart/jupyter-notebooks/Dockerfile
@@ -0,0 +1,65 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+# -------------------------------------------------------------
+# This Dockerfile creates a custom Docker image for Jupyter
+# to use with the Apache Druid Jupyter notebook tutorials.
+# Build using `docker build -t imply/druid-notebook:latest .`
+# -------------------------------------------------------------
+
+# Use the Jupyter base notebook as the base image
+# Copyright (c) Project Jupyter Contributors.
+# Distributed under the terms of the 3-Clause BSD License.
+FROM jupyter/base-notebook
+
+# Set the container working directory
+WORKDIR /home/jovyan
+
+# Install required Python packages
+RUN pip install requests
+RUN pip install pandas
+RUN pip install numpy
+RUN pip install seaborn
+RUN pip install bokeh
+RUN pip install kafka-python
+RUN pip install sortedcontainers
+
+# Install druidapi client from apache/druid
+# Local install requires sudo privileges
+USER root
+ADD druidapi /home/jovyan/druidapi
+WORKDIR /home/jovyan/druidapi
+RUN pip install .
+WORKDIR /home/jovyan
+
+# Import data generator and configuration file
+# Change permissions to allow import (requires sudo privileges)
+# WIP -- change to apache repo
+ADD
https://raw.githubusercontent.com/shallada/druid/data-generator/examples/quickstart/jupyter-notebooks/data-generator/DruidDataDriver.py
.
+ADD docker-jupyter/kafka_docker_config.json .
+RUN chmod 664 DruidDataDriver.py
+RUN chmod 664 kafka_docker_config.json
+USER jovyan
+
+# Copy the Jupyter notebook tutorials from the
+# build directory to the image working directory
+COPY ./*ipynb .
+
+# Add location of the data generator to PYTHONPATH
+ENV PYTHONPATH "${PYTHONPATH}:/home/jovyan"
+
diff --git a/examples/quickstart/jupyter-notebooks/README.md
b/examples/quickstart/jupyter-notebooks/README.md
index 826ae5df34..361908c131 100644
--- a/examples/quickstart/jupyter-notebooks/README.md
+++ b/examples/quickstart/jupyter-notebooks/README.md
@@ -1,12 +1,5 @@
# Jupyter Notebook tutorials for Druid
-If you are reading this in Jupyter, switch over to the
[0-START-HERE](0-START-HERE.ipynb)
-notebook instead.
-
-<!-- This README, the "0-START-HERE" notebook, and the
tutorial-jupyter-index.md file in
-docs/tutorials share a lot of the same content. If you make a change in one
place, update
-the other too. -->
-
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
@@ -26,70 +19,13 @@ the other too. -->
~ under the License.
-->
+If you are reading this in Jupyter, switch over to the
[0-START-HERE](0-START-HERE.ipynb)
+notebook instead.
+
You can try out the Druid APIs using the Jupyter Notebook-based tutorials.
These
tutorials provide snippets of Python code that you can use to run calls against
the Druid API to complete the tutorial.
-## Prerequisites
-
-Make sure you meet the following requirements before starting the
Jupyter-based tutorials:
-
-- Python 3
-
-- The `requests` package for Python. For example, you can install it with the
following command:
-
- ```bash
- pip install requests
- ```
-
-- JupyterLab (recommended) or Jupyter Notebook running on a non-default port.
By default, Druid
- and Jupyter both try to use port `8888`, so start Jupyter on a different
port.
-
- - Install JupyterLab or Notebook:
-
- ```bash
- # Install JupyterLab
- pip install jupyterlab
- # Install Jupyter Notebook
- pip install notebook
- ```
- - Start Jupyter using either JupyterLab
- ```bash
- # Start JupyterLab on port 3001
- jupyter lab --port 3001
- ```
-
- Or using Jupyter Notebook
- ```bash
- # Start Jupyter Notebook on port 3001
- jupyter notebook --port 3001
- ```
-
-- The Python API client for Druid. Clone the Druid repo if you haven't already.
-Go to your Druid source repo and install `druidapi` with the following
commands:
-
- ```bash
- cd examples/quickstart/jupyter-notebooks/druidapi
- pip install .
- ```
-
-- An available Druid instance. You can use the [quickstart
deployment](https://druid.apache.org/docs/latest/tutorials/index.html).
- The tutorials assume that you are using the quickstart, so no authentication
or authorization
- is expected unless explicitly mentioned.
-
- If you contribute to Druid, and work with Druid integration tests, can use a
test cluster.
- Assume you have an environment variable, `DRUID_DEV`, which identifies your
Druid source repo.
-
- ```bash
- cd $DRUID_DEV
- ./it.sh build
- ./it.sh image
- ./it.sh up <category>
- ```
-
- Replace `<catagory>` with one of the available integration test categories.
See the integration
- test `README.md` for details.
-
-## Continue in Jupyter
+For information on prerequisites and getting started with the Jupyter-based
tutorials,
+see [Jupyter Notebook
tutorials](../../../docs/tutorials/tutorial-jupyter-index.md).
-Start Jupyter (see above) and navigate to the "0-START-HERE" notebook for more
information.
diff --git a/examples/quickstart/jupyter-notebooks/docker-jupyter/README.md
b/examples/quickstart/jupyter-notebooks/docker-jupyter/README.md
new file mode 100644
index 0000000000..028eb1f9b2
--- /dev/null
+++ b/examples/quickstart/jupyter-notebooks/docker-jupyter/README.md
@@ -0,0 +1,60 @@
+<!--
+ ~ Licensed to the Apache Software Foundation (ASF) under one
+ ~ or more contributor license agreements. See the NOTICE file
+ ~ distributed with this work for additional information
+ ~ regarding copyright ownership. The ASF licenses this file
+ ~ to you under the Apache License, Version 2.0 (the
+ ~ "License"); you may not use this file except in compliance
+ ~ with the License. You may obtain a copy of the License at
+ ~
+ ~ http://www.apache.org/licenses/LICENSE-2.0
+ ~
+ ~ Unless required by applicable law or agreed to in writing,
+ ~ software distributed under the License is distributed on an
+ ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ ~ KIND, either express or implied. See the License for the
+ ~ specific language governing permissions and limitations
+ ~ under the License.
+ -->
+
+# Jupyter in Docker
+
+For details on getting started with Jupyter in Docker,
+see [Docker for Jupyter Notebook
tutorials](../../../../docs/tutorials/tutorial-jupyter-docker.md).
+
+## Contributing
+
+### Rebuild Jupyter image
+
+You may want to update the Jupyter image to access new or updated tutorial
notebooks,
+include new Python packages, or update configuration files.
+
+To build the custom Jupyter image locally:
+
+1. Clone the Druid repo if you haven't already.
+2. Navigate to `examples/quickstart/jupyter-notebooks` in your Druid source
repo.
+3. Edit the image definition in `Dockerfile`.
+4. Navigate to the `docker-jupyter` directory.
+5. Generate the new build using the following command:
+
+ ```shell
+ DRUID_VERSION=25.0.0 docker compose --profile all-services -f
docker-compose-local.yaml up -d --build
+ ```
+
+ You can change the value of `DRUID_VERSION` or the profile used from the
Docker Compose file.
+
+### Update Docker Compose
+
+The Docker Compose file defines a multi-container application that allows you
to run
+the custom Jupyter Notebook container, Apache Druid, and Apache Kafka.
+
+Any changes to `docker-compose.yaml` should also be made to
`docker-compose-local.yaml`
+and vice versa. These files should be identical except that
`docker-compose.yaml`
+contains an `image` attribute while `docker-compose-local.yaml` contains a
`build` subsection.
+
+If you update `docker-compose.yaml`, recreate the ZIP file using the following
command:
+
+```bash
+zip tutorial-jupyter-docker.zip docker-compose.yaml environment
+```
+
diff --git
a/examples/quickstart/jupyter-notebooks/docker-jupyter/docker-compose-local.yaml
b/examples/quickstart/jupyter-notebooks/docker-jupyter/docker-compose-local.yaml
new file mode 100644
index 0000000000..9fb241deb8
--- /dev/null
+++
b/examples/quickstart/jupyter-notebooks/docker-jupyter/docker-compose-local.yaml
@@ -0,0 +1,172 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+---
+version: "2.2"
+
+volumes:
+ metadata_data: {}
+ middle_var: {}
+ historical_var: {}
+ broker_var: {}
+ coordinator_var: {}
+ router_var: {}
+ druid_shared: {}
+
+
+services:
+ postgres:
+ image: postgres:latest
+ container_name: postgres
+ profiles: ["druid-jupyter", "all-services"]
+ volumes:
+ - metadata_data:/var/lib/postgresql/data
+ environment:
+ - POSTGRES_PASSWORD=FoolishPassword
+ - POSTGRES_USER=druid
+ - POSTGRES_DB=druid
+
+ # Need 3.5 or later for container nodes
+ zookeeper:
+ image: zookeeper:latest
+ container_name: zookeeper
+ profiles: ["druid-jupyter", "all-services"]
+ ports:
+ - "2181:2181"
+ environment:
+ - ZOO_MY_ID=1
+ - ALLOW_ANONYMOUS_LOGIN=yes
+
+ kafka:
+ image: bitnami/kafka:latest
+ container_name: kafka-broker
+ profiles: ["all-services"]
+ ports:
+ # To learn about configuring Kafka for access across networks see
+ #
https://www.confluent.io/blog/kafka-client-cannot-connect-to-broker-on-aws-on-docker-etc/
+ - "9092:9092"
+ depends_on:
+ - zookeeper
+ environment:
+ - KAFKA_BROKER_ID=1
+ - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092
+ - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092
+ - KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
+ - ALLOW_PLAINTEXT_LISTENER=yes
+
+ coordinator:
+ image: apache/druid:${DRUID_VERSION}
+ container_name: coordinator
+ profiles: ["druid-jupyter", "all-services"]
+ volumes:
+ - druid_shared:/opt/shared
+ - coordinator_var:/opt/druid/var
+ depends_on:
+ - zookeeper
+ - postgres
+ ports:
+ - "8081:8081"
+ command:
+ - coordinator
+ env_file:
+ - environment
+
+ broker:
+ image: apache/druid:${DRUID_VERSION}
+ container_name: broker
+ profiles: ["druid-jupyter", "all-services"]
+ volumes:
+ - broker_var:/opt/druid/var
+ depends_on:
+ - zookeeper
+ - postgres
+ - coordinator
+ ports:
+ - "8082:8082"
+ command:
+ - broker
+ env_file:
+ - environment
+
+ historical:
+ image: apache/druid:${DRUID_VERSION}
+ container_name: historical
+ profiles: ["druid-jupyter", "all-services"]
+ volumes:
+ - druid_shared:/opt/shared
+ - historical_var:/opt/druid/var
+ depends_on:
+ - zookeeper
+ - postgres
+ - coordinator
+ ports:
+ - "8083:8083"
+ command:
+ - historical
+ env_file:
+ - environment
+
+ middlemanager:
+ image: apache/druid:${DRUID_VERSION}
+ container_name: middlemanager
+ profiles: ["druid-jupyter", "all-services"]
+ volumes:
+ - druid_shared:/opt/shared
+ - middle_var:/opt/druid/var
+ depends_on:
+ - zookeeper
+ - postgres
+ - coordinator
+ ports:
+ - "8091:8091"
+ - "8100-8105:8100-8105"
+ command:
+ - middleManager
+ env_file:
+ - environment
+
+ router:
+ image: apache/druid:${DRUID_VERSION}
+ container_name: router
+ profiles: ["druid-jupyter", "all-services"]
+ volumes:
+ - router_var:/opt/druid/var
+ depends_on:
+ - zookeeper
+ - postgres
+ - coordinator
+ ports:
+ - "8888:8888"
+ command:
+ - router
+ env_file:
+ - environment
+
+ jupyter:
+ build:
+ context: ..
+ dockerfile: Dockerfile
+ container_name: jupyter
+ profiles: ["jupyter", "all-services"]
+ environment:
+ DOCKER_STACKS_JUPYTER_CMD: "notebook"
+ NOTEBOOK_ARGS: "--NotebookApp.token=''"
+ ports:
+ - "${JUPYTER_PORT:-8889}:8888"
+ volumes:
+ - ./notebooks:/home/jovyan/work
diff --git
a/examples/quickstart/jupyter-notebooks/docker-jupyter/docker-compose.yaml
b/examples/quickstart/jupyter-notebooks/docker-jupyter/docker-compose.yaml
new file mode 100644
index 0000000000..d9e95c085b
--- /dev/null
+++ b/examples/quickstart/jupyter-notebooks/docker-jupyter/docker-compose.yaml
@@ -0,0 +1,170 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+---
+version: "2.2"
+
+volumes:
+ metadata_data: {}
+ middle_var: {}
+ historical_var: {}
+ broker_var: {}
+ coordinator_var: {}
+ router_var: {}
+ druid_shared: {}
+
+
+services:
+ postgres:
+ image: postgres:latest
+ container_name: postgres
+ profiles: ["druid-jupyter", "all-services"]
+ volumes:
+ - metadata_data:/var/lib/postgresql/data
+ environment:
+ - POSTGRES_PASSWORD=FoolishPassword
+ - POSTGRES_USER=druid
+ - POSTGRES_DB=druid
+
+ # Need 3.5 or later for container nodes
+ zookeeper:
+ image: zookeeper:latest
+ container_name: zookeeper
+ profiles: ["druid-jupyter", "all-services"]
+ ports:
+ - "2181:2181"
+ environment:
+ - ZOO_MY_ID=1
+ - ALLOW_ANONYMOUS_LOGIN=yes
+
+ kafka:
+ image: bitnami/kafka:latest
+ container_name: kafka-broker
+ profiles: ["all-services"]
+ ports:
+ # To learn about configuring Kafka for access across networks see
+ #
https://www.confluent.io/blog/kafka-client-cannot-connect-to-broker-on-aws-on-docker-etc/
+ - "9092:9092"
+ depends_on:
+ - zookeeper
+ environment:
+ - KAFKA_BROKER_ID=1
+ - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092
+ - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092
+ - KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
+ - ALLOW_PLAINTEXT_LISTENER=yes
+
+ coordinator:
+ image: apache/druid:${DRUID_VERSION}
+ container_name: coordinator
+ profiles: ["druid-jupyter", "all-services"]
+ volumes:
+ - druid_shared:/opt/shared
+ - coordinator_var:/opt/druid/var
+ depends_on:
+ - zookeeper
+ - postgres
+ ports:
+ - "8081:8081"
+ command:
+ - coordinator
+ env_file:
+ - environment
+
+ broker:
+ image: apache/druid:${DRUID_VERSION}
+ container_name: broker
+ profiles: ["druid-jupyter", "all-services"]
+ volumes:
+ - broker_var:/opt/druid/var
+ depends_on:
+ - zookeeper
+ - postgres
+ - coordinator
+ ports:
+ - "8082:8082"
+ command:
+ - broker
+ env_file:
+ - environment
+
+ historical:
+ image: apache/druid:${DRUID_VERSION}
+ container_name: historical
+ profiles: ["druid-jupyter", "all-services"]
+ volumes:
+ - druid_shared:/opt/shared
+ - historical_var:/opt/druid/var
+ depends_on:
+ - zookeeper
+ - postgres
+ - coordinator
+ ports:
+ - "8083:8083"
+ command:
+ - historical
+ env_file:
+ - environment
+
+ middlemanager:
+ image: apache/druid:${DRUID_VERSION}
+ container_name: middlemanager
+ profiles: ["druid-jupyter", "all-services"]
+ volumes:
+ - druid_shared:/opt/shared
+ - middle_var:/opt/druid/var
+ depends_on:
+ - zookeeper
+ - postgres
+ - coordinator
+ ports:
+ - "8091:8091"
+ - "8100-8105:8100-8105"
+ command:
+ - middleManager
+ env_file:
+ - environment
+
+ router:
+ image: apache/druid:${DRUID_VERSION}
+ container_name: router
+ profiles: ["druid-jupyter", "all-services"]
+ volumes:
+ - router_var:/opt/druid/var
+ depends_on:
+ - zookeeper
+ - postgres
+ - coordinator
+ ports:
+ - "8888:8888"
+ command:
+ - router
+ env_file:
+ - environment
+
+ jupyter:
+ image: imply/druid-notebook:latest
+ container_name: jupyter
+ profiles: ["jupyter", "all-services"]
+ environment:
+ DOCKER_STACKS_JUPYTER_CMD: "notebook"
+ NOTEBOOK_ARGS: "--NotebookApp.token=''"
+ ports:
+ - "${JUPYTER_PORT:-8889}:8888"
+ volumes:
+ - ./notebooks:/home/jovyan/work
diff --git a/examples/quickstart/jupyter-notebooks/docker-jupyter/environment
b/examples/quickstart/jupyter-notebooks/docker-jupyter/environment
new file mode 100644
index 0000000000..c63a5c0e88
--- /dev/null
+++ b/examples/quickstart/jupyter-notebooks/docker-jupyter/environment
@@ -0,0 +1,56 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+# Java tuning
+#DRUID_XMX=1g
+#DRUID_XMS=1g
+#DRUID_MAXNEWSIZE=250m
+#DRUID_NEWSIZE=250m
+#DRUID_MAXDIRECTMEMORYSIZE=6172m
+DRUID_SINGLE_NODE_CONF=micro-quickstart
+
+druid_emitter_logging_logLevel=debug
+
+druid_extensions_loadList=["druid-histogram", "druid-datasketches",
"druid-lookups-cached-global", "postgresql-metadata-storage",
"druid-multi-stage-query", "druid-kafka-indexing-service"]
+
+druid_zk_service_host=zookeeper
+
+druid_metadata_storage_host=
+druid_metadata_storage_type=postgresql
+druid_metadata_storage_connector_connectURI=jdbc:postgresql://postgres:5432/druid
+druid_metadata_storage_connector_user=druid
+druid_metadata_storage_connector_password=FoolishPassword
+
+druid_coordinator_balancer_strategy=cachingCost
+
+druid_indexer_runner_javaOptsArray=["-server", "-Xmx1g", "-Xms1g",
"-XX:MaxDirectMemorySize=3g", "-Duser.timezone=UTC", "-Dfile.encoding=UTF-8",
"-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"]
+druid_indexer_fork_property_druid_processing_buffer_sizeBytes=256MiB
+
+
+
+druid_storage_type=local
+druid_storage_storageDirectory=/opt/shared/segments
+druid_indexer_logs_type=file
+druid_indexer_logs_directory=/opt/shared/indexing-logs
+
+druid_processing_numThreads=2
+druid_processing_numMergeBuffers=2
+
+DRUID_LOG4J=<?xml version="1.0" encoding="UTF-8" ?><Configuration
status="WARN"><Appenders><Console name="Console"
target="SYSTEM_OUT"><PatternLayout pattern="%d{ISO8601} %p [%t] %c -
%m%n"/></Console></Appenders><Loggers><Root level="info"><AppenderRef
ref="Console"/></Root><Logger name="org.apache.druid.jetty.RequestLog"
additivity="false" level="DEBUG"><AppenderRef
ref="Console"/></Logger></Loggers></Configuration>
+
diff --git
a/examples/quickstart/jupyter-notebooks/docker-jupyter/kafka_docker_config.json
b/examples/quickstart/jupyter-notebooks/docker-jupyter/kafka_docker_config.json
new file mode 100644
index 0000000000..2add8f3fa1
--- /dev/null
+++
b/examples/quickstart/jupyter-notebooks/docker-jupyter/kafka_docker_config.json
@@ -0,0 +1,90 @@
+{
+ "target": {
+ "type": "kafka",
+ "endpoint": "kafka:9092",
+ "topic": "social_media"
+ },
+ "emitters": [
+ {
+ "name": "example_record_1",
+ "dimensions": [
+ {
+ "type": "enum",
+ "name": "username",
+ "values": ["willow", "mia", "leon", "milton", "miette", "gus",
"jojo", "rocket"],
+ "cardinality_distribution": {
+ "type": "uniform",
+ "min": 0,
+ "max": 7
+ }
+ },
+ {
+ "type": "string",
+ "name": "post_title",
+ "length_distribution": {"type": "uniform", "min": 1, "max": 140},
+ "cardinality": 0,
+ "chars":
"abcdefghijklmnopqrstuvwxyz0123456789_ABCDEFGHIJKLMNOPQRSTUVWXYZ!';:,."
+ },
+ {
+ "type": "int",
+ "name": "views",
+ "distribution": {
+ "type": "exponential",
+ "mean": 10000
+ },
+ "cardinality": 0
+ },
+ {
+ "type": "int",
+ "name": "upvotes",
+ "distribution": {
+ "type": "normal",
+ "mean": 70,
+ "stddev": 20
+ },
+ "cardinality": 0
+ },
+ {
+ "type": "int",
+ "name": "comments",
+ "distribution": {
+ "type": "normal",
+ "mean": 10,
+ "stddev": 5
+ },
+ "cardinality": 0
+ },
+ {
+ "type": "enum",
+ "name": "edited",
+ "values": ["True","False"],
+ "cardinality_distribution": {
+ "type": "uniform",
+ "min": 0,
+ "max": 1
+ }
+ }
+ ]
+ }
+ ],
+ "interarrival": {
+ "type": "constant",
+ "value": 1
+ },
+ "states": [
+ {
+ "name": "state_1",
+ "emitter": "example_record_1",
+ "delay": {
+ "type": "constant",
+ "value": 1
+ },
+ "transitions": [
+ {
+ "next": "state_1",
+ "probability": 1.0
+ }
+ ]
+ }
+ ]
+}
diff --git
a/examples/quickstart/jupyter-notebooks/docker-jupyter/tutorial-jupyter-docker.zip
b/examples/quickstart/jupyter-notebooks/docker-jupyter/tutorial-jupyter-docker.zip
new file mode 100644
index 0000000000..4a3c02e4c4
Binary files /dev/null and
b/examples/quickstart/jupyter-notebooks/docker-jupyter/tutorial-jupyter-docker.zip
differ
diff --git a/examples/quickstart/jupyter-notebooks/kafka-tutorial.ipynb
b/examples/quickstart/jupyter-notebooks/kafka-tutorial.ipynb
new file mode 100644
index 0000000000..9ab6ce1681
--- /dev/null
+++ b/examples/quickstart/jupyter-notebooks/kafka-tutorial.ipynb
@@ -0,0 +1,782 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Tutorial: Ingest and query data from Apache Kafka\n",
+ "\n",
+ "<!--\n",
+ " ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+ " ~ or more contributor license agreements. See the NOTICE file\n",
+ " ~ distributed with this work for additional information\n",
+ " ~ regarding copyright ownership. The ASF licenses this file\n",
+ " ~ to you under the Apache License, Version 2.0 (the\n",
+ " ~ \"License\"); you may not use this file except in compliance\n",
+ " ~ with the License. You may obtain a copy of the License at\n",
+ " ~\n",
+ " ~ http://www.apache.org/licenses/LICENSE-2.0\n",
+ " ~\n",
+ " ~ Unless required by applicable law or agreed to in writing,\n",
+ " ~ software distributed under the License is distributed on an\n",
+ " ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+ " ~ KIND, either express or implied. See the License for the\n",
+ " ~ specific language governing permissions and limitations\n",
+ " ~ under the License.\n",
+ " -->\n",
+ "\n",
+ "This tutorial introduces you to streaming ingestion in Apache Druid using
the Apache Kafka event streaming platform.\n",
+ "Follow along to learn how to create and load data into a Kafka topic,
start ingesting data from the topic into Druid, and query results over time.
This tutorial assumes you have a basic understanding of Druid ingestion,
querying, and API requests."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Table of contents\n",
+ "\n",
+ "* [Prerequisites](#Prerequisites)\n",
+ "* [Load Druid API client](#Load-Druid-API-client)\n",
+ "* [Create Kafka topic](#Create-Kafka-topic)\n",
+ "* [Load data into Kafka topic](#Load-data-into-Kafka-topic)\n",
+ "* [Start Druid ingestion](#Start-Druid-ingestion)\n",
+ "* [Query Druid datasource and visualize query
results](#Query-Druid-datasource-and-visualize-query-results)\n",
+ "* [Learn more](#Learn-more)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Prerequisites\n",
+ "\n",
+ "Launch this tutorial and all prerequisites using the `all-services`
profile of the Docker Compose file for Jupyter-based Druid tutorials. For more
information, see [Docker for Jupyter Notebook
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+ "\n",
+ "Otherwise, you need the following:\n",
+ "* A running Druid instance.\n",
+ " * Update the `druid_host` variable to point to your Router endpoint.
For example, `druid_host = \"http://localhost:8888\"`.\n",
+ "* A running Kafka cluster.\n",
+ " * Update the Kafka bootstrap servers to point to your servers. For
example, `bootstrap_servers=[\"localhost:9092\"]`.\n",
+ "* The following Python packages:\n",
+ " * `druidapi`, a Python client for Apache Druid\n",
+ " * `DruidDataDriver`, a data generator\n",
+ " * `kafka`, a Python client for Apache Kafka\n",
+ " * `pandas`, `matplotlib`, and `seaborn` for data visualization\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Load Druid API client"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "To start the tutorial, run the following cell. It imports the required
Python packages and defines a variable for the Druid client, and another for
the SQL client used to run SQL commands."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "<style>\n",
+ " .druid table {\n",
+ " border: 1px solid black;\n",
+ " border-collapse: collapse;\n",
+ " }\n",
+ "\n",
+ " .druid th, .druid td {\n",
+ " padding: 4px 1em ;\n",
+ " text-align: left;\n",
+ " }\n",
+ "\n",
+ " td.druid-right, th.druid-right {\n",
+ " text-align: right;\n",
+ " }\n",
+ "\n",
+ " td.druid-center, th.druid-center {\n",
+ " text-align: center;\n",
+ " }\n",
+ "\n",
+ " .druid .druid-left {\n",
+ " text-align: left;\n",
+ " }\n",
+ "\n",
+ " .druid-alert {\n",
+ " font-weight: bold;\n",
+ " }\n",
+ "\n",
+ " .druid-error {\n",
+ " color: red;\n",
+ " }\n",
+ "</style>\n"
+ ],
+ "text/plain": [
+ "<IPython.core.display.HTML object>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "import druidapi\n",
+ "import json\n",
+ "\n",
+ "# druid_host is the hostname and port for your Druid deployment. \n",
+ "# In a distributed environment, you can point to other Druid services.\n",
+ "# In this tutorial, you'll use the Router service as the `druid_host`.\n",
+ "druid_host = \"http://router:8888\"\n",
+ "\n",
+ "druid = druidapi.jupyter_client(druid_host)\n",
+ "display = druid.display\n",
+ "sql_client = druid.sql\n",
+ "\n",
+ "# Create a rest client for native JSON ingestion for streaming data\n",
+ "rest_client = druidapi.rest.DruidRestClient(\"http://coordinator:8081\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Create Kafka topic"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "This notebook relies on the Python client for the Apache Kafka. Import
the Kafka producer and consumer modules, then create a Kafka client. You use
the Kafka producer to create and publish records to a new topic named
`social_media`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from kafka import KafkaProducer\n",
+ "from kafka import KafkaConsumer\n",
+ "\n",
+ "# Kafka runs on kafka:9092 in multi-container tutorial application\n",
+ "producer = KafkaProducer(bootstrap_servers='kafka:9092')\n",
+ "topic_name = \"social_media\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Create the `social_media` topic and send a sample event. The `send()`
command returns a metadata descriptor for the record."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "<kafka.producer.future.FutureRecordMetadata at 0x7f5f65344610>"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "event = {\n",
+ " \"__time\": \"2023-01-03T16:40:21.501\",\n",
+ " \"username\": \"willow\",\n",
+ " \"post_title\": \"This title is required\",\n",
+ " \"views\": 15284,\n",
+ " \"upvotes\": 124,\n",
+ " \"comments\": 21,\n",
+ " \"edited\": \"True\"\n",
+ "}\n",
+ "\n",
+ "producer.send(topic_name, json.dumps(event).encode('utf-8'))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "To verify that the Kafka topic stored the event, create a consumer client
to read records from the Kafka cluster, and get the next (only) message:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "{\"__time\": \"2023-01-03T16:40:21.501\", \"username\": \"willow\",
\"post_title\": \"This title is required\", \"views\": 15284, \"upvotes\": 124,
\"comments\": 21, \"edited\": \"True\"}\n"
+ ]
+ }
+ ],
+ "source": [
+ "consumer = KafkaConsumer(topic_name, bootstrap_servers=['kafka:9092'],
auto_offset_reset='earliest',\n",
+ " enable_auto_commit=True)\n",
+ "\n",
+ "print(next(consumer).value.decode('utf-8'))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Load data into Kafka topic"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Instead of manually creating events to send to the Kafka topic, use a
data generator to simulate a continuous data stream. This tutorial makes use of
Druid Data Driver to simulate a continuous data stream into the `social_media`
Kafka topic. To learn more about the Druid Data Driver, see the Druid Summit
talk, [Generating Time centric Data for Apache
Druid](https://www.youtube.com/watch?v=3zAOeLe3iAo).\n",
+ "\n",
+ "In this notebook, you use a background process to continuously load data
into the Kafka topic.\n",
+ "This allows you to keep executing commands in this notebook while data is
constantly being streamed into the topic."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Run the following cells to load sample data into the `social_media` Kafka
topic:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import multiprocessing as mp\n",
+ "from datetime import datetime\n",
+ "import DruidDataDriver"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def run_driver():\n",
+ " DruidDataDriver.simulate(\"kafka_docker_config.json\", None, None,
\"REAL\", datetime.now())\n",
+ " \n",
+ "mp.set_start_method('fork')\n",
+ "ps = mp.Process(target=run_driver)\n",
+ "ps.start()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Start Druid ingestion"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Now that you have a new Kafka topic and data being streamed into the
topic, you ingest the data into Druid by submitting a Kafka ingestion spec.\n",
+ "The ingestion spec describes the following:\n",
+ "* where to source the data to ingest (in `spec > ioConfig`),\n",
+ "* the datasource to ingest data into (in `spec > dataSchema >
dataSource`), and\n",
+ "* what the data looks like (in `spec > dataSchema > dimensionsSpec`).\n",
+ "\n",
+ "Other properties control how Druid aggregates and stores data. For more
information, see the Druid documenation:\n",
+ "* [Apache Kafka
ingestion](https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html)\n",
+ "* [Ingestion spec
reference](https://druid.apache.org/docs/latest/ingestion/ingestion-spec.html)\n",
+ "\n",
+ "Run the following cells to define and view the Kafka ingestion spec."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "kafka_ingestion_spec = \"{\\\"type\\\": \\\"kafka\\\",\\\"spec\\\":
{\\\"ioConfig\\\": {\\\"type\\\": \\\"kafka\\\",\\\"consumerProperties\\\":
{\\\"bootstrap.servers\\\": \\\"kafka:9092\\\"},\\\"topic\\\":
\\\"social_media\\\",\\\"inputFormat\\\": {\\\"type\\\":
\\\"json\\\"},\\\"useEarliestOffset\\\": true},\\\"tuningConfig\\\":
{\\\"type\\\": \\\"kafka\\\"},\\\"dataSchema\\\": {\\\"dataSource\\\":
\\\"social_media\\\",\\\"timestampSpec\\\": {\\\"column\\\":
\\\"__time\\\",\\\"for [...]
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "{\n",
+ " \"type\": \"kafka\",\n",
+ " \"spec\": {\n",
+ " \"ioConfig\": {\n",
+ " \"type\": \"kafka\",\n",
+ " \"consumerProperties\": {\n",
+ " \"bootstrap.servers\": \"kafka:9092\"\n",
+ " },\n",
+ " \"topic\": \"social_media\",\n",
+ " \"inputFormat\": {\n",
+ " \"type\": \"json\"\n",
+ " },\n",
+ " \"useEarliestOffset\": true\n",
+ " },\n",
+ " \"tuningConfig\": {\n",
+ " \"type\": \"kafka\"\n",
+ " },\n",
+ " \"dataSchema\": {\n",
+ " \"dataSource\": \"social_media\",\n",
+ " \"timestampSpec\": {\n",
+ " \"column\": \"__time\",\n",
+ " \"format\": \"iso\"\n",
+ " },\n",
+ " \"dimensionsSpec\": {\n",
+ " \"dimensions\": [\n",
+ " \"username\",\n",
+ " \"post_title\",\n",
+ " {\n",
+ " \"type\": \"long\",\n",
+ " \"name\": \"views\"\n",
+ " },\n",
+ " {\n",
+ " \"type\": \"long\",\n",
+ " \"name\": \"upvotes\"\n",
+ " },\n",
+ " {\n",
+ " \"type\": \"long\",\n",
+ " \"name\": \"comments\"\n",
+ " },\n",
+ " \"edited\"\n",
+ " ]\n",
+ " },\n",
+ " \"granularitySpec\": {\n",
+ " \"queryGranularity\": \"none\",\n",
+ " \"rollup\": false,\n",
+ " \"segmentGranularity\": \"hour\"\n",
+ " }\n",
+ " }\n",
+ " }\n",
+ "}\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(json.dumps(json.loads(kafka_ingestion_spec), indent=4))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Send the spec to Druid to start the streaming ingestion from Kafka:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "<Response [200]>"
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "headers = {\n",
+ " 'Content-Type': 'application/json'\n",
+ "}\n",
+ "\n",
+ "rest_client.post(\"/druid/indexer/v1/supervisor\", kafka_ingestion_spec,
headers=headers)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "A `200` response indicates that the request was successful. You can view
the running ingestion task and the new datasource in the web console at
http://localhost:8888/unified-console.html."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Query Druid datasource and visualize query results"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "You can now query the new datasource called `social_media`. In this
section, you also visualize query results using the Matplotlib and Seaborn
visualization libraries. Run the following cell import these packages."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd\n",
+ "import matplotlib\n",
+ "import matplotlib.pyplot as plt\n",
+ "import seaborn as sns"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Run a simple query to view a subset of rows from the new datasource:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "<div class=\"druid\"><table>\n",
+
"<tr><th>__time</th><th>username</th><th>post_title</th><th>views</th><th>upvotes</th><th>comments</th><th>edited</th></tr>\n",
+ "<tr><td>2023-01-03T16:40:21.501Z</td><td>willow</td><td>This title is
required</td><td>15284</td><td>124</td><td>21</td><td>True</td></tr>\n",
+
"<tr><td>2023-05-02T23:34:54.451Z</td><td>gus</td><td>3y4hkmd1!'Er4;</td><td>4031</td><td>93</td><td>15</td><td>False</td></tr>\n",
+
"<tr><td>2023-05-02T23:34:55.454Z</td><td>mia</td><td>m62u53:D9s2bOvnY_VM9vjtZ'MyDLvQ7_xGodAP:ZNTXM6cFAt,_jrxBVBeRILLvAF9Z!jM9YNN;3ErV5eGbE_TFQS</td><td>16060</td><td>84</td><td>8</td><td>True</td></tr>\n",
+
"<tr><td>2023-05-02T23:34:55.455Z</td><td>jojo</td><td>rAmeAJrjs;FBj:zy2MwoGh_P_SowlLTfp6zhX55xqogH.,1DC2xY_x2T;M_Vcu3QWaz650u;Roa</td><td>14313</td><td>65</td><td>7</td><td>False</td></tr>\n",
+
"<tr><td>2023-05-02T23:34:56.456Z</td><td>willow</td><td>3bHB,iJdE;sedTDA,1dKGDAZL!qdsvO_tv.4Jrq7fa.KWcHPD'TB_5nnvsf9EgtnN8tGeeA0MjKc30iubJ:D'l7pHNihWpFz8K'46q!vJs</td><td>4237</td><td>112</td><td>3</td><td>True</td></tr>\n",
+ "</table></div>"
+ ],
+ "text/plain": [
+ "<IPython.core.display.HTML object>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "sql = '''\n",
+ "SELECT * FROM social_media LIMIT 5\n",
+ "'''\n",
+ "display.sql(sql)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "In this social media scenario, each incoming event represents a post on
social media, for which you collect the timestamp, username, and post metadata.
You are interested in analyzing the total number of upvotes for all posts,
compared between users. Preview this data with the following query:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "<div class=\"druid\"><table>\n",
+ "<tr><th>num_posts</th><th>total_upvotes</th><th>username</th></tr>\n",
+ "<tr><td>155</td><td>10985</td><td>willow</td></tr>\n",
+ "<tr><td>161</td><td>11223</td><td>gus</td></tr>\n",
+ "<tr><td>164</td><td>11456</td><td>leon</td></tr>\n",
+ "<tr><td>173</td><td>12098</td><td>jojo</td></tr>\n",
+ "<tr><td>176</td><td>12175</td><td>mia</td></tr>\n",
+ "<tr><td>177</td><td>11998</td><td>milton</td></tr>\n",
+ "<tr><td>185</td><td>13256</td><td>miette</td></tr>\n",
+ "<tr><td>188</td><td>13360</td><td>rocket</td></tr>\n",
+ "</table></div>"
+ ],
+ "text/plain": [
+ "<IPython.core.display.HTML object>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "sql = '''\n",
+ "SELECT\n",
+ " COUNT(post_title) as num_posts,\n",
+ " SUM(upvotes) as total_upvotes,\n",
+ " username\n",
+ "FROM social_media\n",
+ "GROUP BY username\n",
+ "ORDER BY num_posts\n",
+ "'''\n",
+ "\n",
+ "response = sql_client.sql_query(sql)\n",
+ "response.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Visualize the total number of upvotes per user using a line plot. You
sort the results by username before plotting because the order of users may
vary as new results arrive."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png":
"iVBORw0KGgoAAAANSUhEUgAAAk0AAAHMCAYAAADI/py4AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy88F64QAAAACXBIWXMAAA9hAAAPYQGoP6dpAACRN0lEQVR4nOzdd3iTZfcH8O+T7j3ppNAySwdtAQtF9p7KEEVl+JMhvjJERJYyVBRRnCjI60AcrwqyQaBskLJbSiktUArdmzZd6UjO7480sWE2Je2TpOdzXb20z/M0OUlpcnKf+z63QEQExhhjjDH2UBKxA2CMMcYYMwScNDHGGGOM1QEnTYwxxhhjdcBJE2OMMcZYHXDSxBhjjDFWB5w0McYYY4zVASdNjDHGGGN1YCp2AMZCoVAgIyMDdnZ2EARB7HAYY4wxVgdEhOLiYnh5eUEiefhYEidNOpKRkQEfHx+xw2CM
[...]
+ "text/plain": [
+ "<Figure size 640x480 with 1 Axes>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "df = pd.DataFrame(response.json)\n",
+ "df = df.sort_values('username')\n",
+ "\n",
+ "df.plot(x='username', y='total_upvotes', marker='o')\n",
+ "plt.xticks(rotation=45, ha='right')\n",
+ "plt.ylabel(\"Total number of upvotes\")\n",
+ "plt.gca().get_legend().remove()\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The total number of upvotes likely depends on the total number of posts
created per user. To better assess the relative impact per user, you compare
the total number of upvotes (line plot) with the total number of posts."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "<matplotlib.legend.Legend at 0x7f5f18400310>"
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png":
"iVBORw0KGgoAAAANSUhEUgAAA1cAAAHMCAYAAAA5/FJZAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy88F64QAAAACXBIWXMAAA9hAAAPYQGoP6dpAADE60lEQVR4nOzdd3hUdfb48feUTHrvgYSEFhJKQhGkCQSkKS6uFZDq6voVdDH2/UqxLLYFQWHhZ2ddXbEgX9dFpIgiSJESBQIBQiABUkmvk8zc3x9hBmICpExyJ8l5Pc88j5m5c++ZRJI5c87nfDSKoigIIYQQQgghhGgSrdoBCCGEEEIIIURbIMmVEEIIIYQQQtiAJFdCCCGEEEIIYQOSXAkhhBBCCCGEDUhyJYQQQgghhBA2IMmVEEIIIYQQQtiAJFdCCCGEEEIIYQN6tQNoK6qqqjh06BCBgYFotZKzCiGEEK2B
[...]
+ "text/plain": [
+ "<Figure size 640x480 with 2 Axes>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "matplotlib.rc_file_defaults()\n",
+ "ax1 = sns.set_style(style=None, rc=None )\n",
+ "\n",
+ "fig, ax1 = plt.subplots()\n",
+ "plt.xticks(rotation=45, ha='right')\n",
+ "\n",
+ "\n",
+ "sns.lineplot(\n",
+ " data=df, x='username', y='total_upvotes',\n",
+ " marker='o', ax=ax1, label=\"Sum of upvotes\")\n",
+ "ax1.get_legend().remove()\n",
+ "\n",
+ "ax2 = ax1.twinx()\n",
+ "sns.barplot(data=df, x='username', y='num_posts',\n",
+ " order=df['username'], alpha=0.5, ax=ax2, log=True,\n",
+ " color=\"orange\", label=\"Number of posts\")\n",
+ "\n",
+ "\n",
+ "# ask matplotlib for the plotted objects and their labels\n",
+ "lines, labels = ax1.get_legend_handles_labels()\n",
+ "lines2, labels2 = ax2.get_legend_handles_labels()\n",
+ "ax2.legend(lines + lines2, labels + labels2, bbox_to_anchor=(1.55, 1))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "You should see a correlation between total number of upvotes and total
number of posts. In order to track user impact on a more equal footing,
normalize the total number of upvotes relative to the total number of posts,
and plot the result:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png":
"iVBORw0KGgoAAAANSUhEUgAAAkAAAAHMCAYAAAA9ABcIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy88F64QAAAACXBIWXMAAA9hAAAPYQGoP6dpAACLeElEQVR4nO3dd1xT5/cH8E/YIFNkyhQRRFFQHDjqVlyto1ZbrVpXbbXO2or9ujocbW3VDm1t1VpbW2frqHvvCU5EBGSDiuxNcn5/8MstKaAEEi5Jzvv1yktzc3PvuQGSk+c5z/NIiIjAGGOMMaZD9MQOgDHGGGOsrnECxBhjjDGdwwkQY4wxxnQOJ0CMMcYY0zmcADHGGGNM53ACxBhjjDGdwwkQY4wxxnSOgdgB1EcymQzJycmwsLCARCIROxzGGGOMVQMRIScnB87OztDTe34bDydAlUhOToarq6vYYTDGGGOs
[...]
+ "text/plain": [
+ "<Figure size 640x480 with 1 Axes>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "df['upvotes_normalized'] = df['total_upvotes']/df['num_posts']\n",
+ "\n",
+ "df.plot(x='username', y='upvotes_normalized', marker='o',
color='green')\n",
+ "plt.xticks(rotation=45, ha='right')\n",
+ "plt.ylabel(\"Number of upvotes (normalized)\")\n",
+ "plt.gca().get_legend().remove()\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "You've been working with data taken at a single snapshot in time from
when you ran the last query. Run the same query again, and store the output in
`response2`, which you will compare with the previous results:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "<div class=\"druid\"><table>\n",
+ "<tr><th>num_posts</th><th>total_upvotes</th><th>username</th></tr>\n",
+ "<tr><td>404</td><td>28166</td><td>willow</td></tr>\n",
+ "<tr><td>418</td><td>29413</td><td>jojo</td></tr>\n",
+ "<tr><td>419</td><td>29202</td><td>mia</td></tr>\n",
+ "<tr><td>419</td><td>29456</td><td>miette</td></tr>\n",
+ "<tr><td>428</td><td>29472</td><td>gus</td></tr>\n",
+ "<tr><td>433</td><td>30160</td><td>milton</td></tr>\n",
+ "<tr><td>440</td><td>31212</td><td>leon</td></tr>\n",
+ "<tr><td>443</td><td>31063</td><td>rocket</td></tr>\n",
+ "</table></div>"
+ ],
+ "text/plain": [
+ "<IPython.core.display.HTML object>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "response2 = sql_client.sql_query(sql)\n",
+ "response2.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Normalizing the data also helps you evaluate trends over time more
consistently on the same plot axes. Plot the normalized data again, this time
alongside the results from the previous snapshot:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png":
"iVBORw0KGgoAAAANSUhEUgAAAkAAAAHMCAYAAAA9ABcIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy88F64QAAAACXBIWXMAAA9hAAAPYQGoP6dpAAC6DklEQVR4nOzdd3iTZffA8W+SbroodNKWsilQoOxV9ihLkCUqAoKoiALixNetPxDfVwW3Iut9FZSpAlKWjLJX2WWVQgdtoYXuneT3R22ktkDTJk3Sns915ZI+ffI8J0ibk/s+97kVWq1WixBCCCFEDaI0dQBCCCGEEFVNEiAhhBBC1DiSAAkhhBCixpEESAghhBA1jiRAQgghhKhxJAESQgghRI0jCZAQQgghahwrUwdgjjQaDTdu3MDJyQmFQmHqcIQQQghRDlqtloyMDHx8fFAq7z/GIwlQGW7cuIGfn5+pwxBC
[...]
+ "text/plain": [
+ "<Figure size 640x480 with 1 Axes>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "df2 = pd.DataFrame(response2.json)\n",
+ "df2 = df2.sort_values('username')\n",
+ "df2['upvotes_normalized'] = df2['total_upvotes']/df2['num_posts']\n",
+ "\n",
+ "ax = df.plot(x='username', y='upvotes_normalized', marker='o',
color='green', label=\"Time 1\")\n",
+ "df2.plot(x='username', y='upvotes_normalized', marker='o',
color='purple', ax=ax, label=\"Time 2\")\n",
+ "plt.xticks(rotation=45, ha='right')\n",
+ "plt.ylabel(\"Number of upvotes (normalized)\")\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "This plot shows how some users maintain relatively consistent social
media impact between the two query snapshots, whereas other users grow or
decline in their influence.\n",
+ "\n",
+ "## Learn more\n",
+ "\n",
+ "This tutorial showed you how to create a Kafka topic using a Python
client for Kafka, send a simulated stream of data to Kafka using a data
generator, and query and visualize results over time. For more information, see
the following resources:\n",
+ "\n",
+ "* [Apache Kafka
ingestion](https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html)\n",
+ "* [Querying
data](https://druid.apache.org/docs/latest/tutorials/tutorial-query.html)\n",
+ "* [Tutorial: Run with
Docker](https://druid.apache.org/docs/latest/tutorials/docker.html)"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.8"
+ },
+ "vscode": {
+ "interpreter": {
+ "hash": "a4289e5b8bae5973a6609d90f7bc464162478362b9a770893a3c5c597b0b36e7"
+ }
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/website/sidebars.json b/website/sidebars.json
index fbb6bf0866..f1ab145c04 100644
--- a/website/sidebars.json
+++ b/website/sidebars.json
@@ -27,6 +27,7 @@
"tutorials/tutorial-sql-query-view",
"tutorials/tutorial-unnest-arrays",
"tutorials/tutorial-jupyter-index",
+ "tutorials/tutorial-jupyter-docker",
"tutorials/tutorial-jdbc"
],
"Design": [
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]