Re: [PR] Jupyter notebook tutorial - Delete API (druid)

via GitHub Tue, 08 Aug 2023 16:16:06 -0700


317brian commented on code in PR #14781:
URL: https://github.com/apache/druid/pull/14781#discussion_r1287768762



##########
examples/quickstart/jupyter-notebooks/notebooks/04-api/01-delete-api-tutorial.ipynb:
##########
@@ -0,0 +1,938 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "71bdcc40",
+   "metadata": {},
+   "source": [
+    "# Learn to delete data with Druid API\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "In working with data, Druid retains a copies of the existing data 
segments in deep storage and Historical processes. As new data is added into 
Druid, deep storage grows and becomes larger over time unless explicitly 
removed.\n",
+    "\n",
+    "While deep storage is an important part of Druid's elastic, 
fault-tolerant design, over time, data accumulation in deep storage can lead to 
increased storage costs. Periodically deleting data can reclaim storage space 
and promote optimal resource allocation.\n",
+    "\n",
+    "This notebook provides a tutorial on deleting existing data in Druid 
using the Coordinator API endpoints. \n",
+    "\n",
+    "## Table of contents\n",
+    "\n",
+    "- [Prerequisites](#Prerequisites)\n",
+    "- [Ingest data](#Ingest-data)\n",
+    "- [Deletion steps](#Deletion-steps)\n",
+    "- [Delete by time interval](#Delete-by-time-interval)\n",
+    "- [Delete entire table](#Delete-entire-table)\n",
+    "- [Delete by segment ID](#Delete-by-segment-ID)\n",
+    "\n",
+    "For the best experience, use JupyterLab so that you can always access the 
table of contents."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6fc260fc",
+   "metadata": {},
+   "source": [
+    "\n",
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid 26.0.0 or later.\n",
+    "\n",
+    "\n",
+    "Launch this tutorial and all prerequisites using the `druid-jupyter`, 
`kafka-jupyter`, or `all-services` profiles of the Docker Compose file for 
Jupyter-based Druid tutorials. For more information, see [Docker for Jupyter 
Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "\n",
+    "If you do not use the Docker Compose environment, you need the 
following:\n",
+    "\n",
+    "* A running Druid instance.<br>\n",
+    "     Update the `druid_host` variable to point to your Router endpoint. 
For example:\n",
+    "     ```\n",
+    "     druid_host = \"http://localhost:8888\"\n";,
+    "     ```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b8a7510",
+   "metadata": {},
+   "source": [
+    "To start this tutorial, run the next cell. It imports the Python packages 
you'll need and defines a variable for the the Druid host, where the Router 
service listens.\n",
+    "\n",
+    "`druid_host` is the hostname and port for your Druid deployment. In a 
distributed environment, you can point to other Druid services. In this 
tutorial, you'll use the Router service as the `druid_host`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ed52d809",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In the Docker Compose tutorial environment, this is the Router\n",
+    "# service running at \"http://router:8888\".\n";,
+    "# If you are not using the Docker Compose environment, edit the 
`druid_host`.\n",
+    "\n",
+    "druid_host = \"http://host.docker.internal:8888\"\n";,
+    "druid_host"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f3c9a92",
+   "metadata": {},
+   "source": [
+    "Before we proceed with the tutorial, let's use the `/status/health` 
endpoint to verify that the cluster if up and running. This endpoint returns 
the Python value `true` if the Druid cluster has finished starting up and is 
running. Do not move on from this point if the following call does not return 
`true`."

Review Comment:
   Is there a reason to say `Python value `true` isntead of just saying that it 
returns true? The latter is more natural/human. 



##########
examples/quickstart/jupyter-notebooks/notebooks/04-api/01-delete-api-tutorial.ipynb:
##########
@@ -0,0 +1,938 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "71bdcc40",
+   "metadata": {},
+   "source": [
+    "# Learn to delete data with Druid API\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "In working with data, Druid retains a copies of the existing data 
segments in deep storage and Historical processes. As new data is added into 
Druid, deep storage grows and becomes larger over time unless explicitly 
removed.\n",
+    "\n",
+    "While deep storage is an important part of Druid's elastic, 
fault-tolerant design, over time, data accumulation in deep storage can lead to 
increased storage costs. Periodically deleting data can reclaim storage space 
and promote optimal resource allocation.\n",
+    "\n",
+    "This notebook provides a tutorial on deleting existing data in Druid 
using the Coordinator API endpoints. \n",
+    "\n",
+    "## Table of contents\n",
+    "\n",
+    "- [Prerequisites](#Prerequisites)\n",
+    "- [Ingest data](#Ingest-data)\n",
+    "- [Deletion steps](#Deletion-steps)\n",
+    "- [Delete by time interval](#Delete-by-time-interval)\n",
+    "- [Delete entire table](#Delete-entire-table)\n",
+    "- [Delete by segment ID](#Delete-by-segment-ID)\n",
+    "\n",
+    "For the best experience, use JupyterLab so that you can always access the 
table of contents."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6fc260fc",
+   "metadata": {},
+   "source": [
+    "\n",
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid 26.0.0 or later.\n",
+    "\n",
+    "\n",
+    "Launch this tutorial and all prerequisites using the `druid-jupyter`, 
`kafka-jupyter`, or `all-services` profiles of the Docker Compose file for 
Jupyter-based Druid tutorials. For more information, see [Docker for Jupyter 
Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "\n",
+    "If you do not use the Docker Compose environment, you need the 
following:\n",
+    "\n",
+    "* A running Druid instance.<br>\n",
+    "     Update the `druid_host` variable to point to your Router endpoint. 
For example:\n",
+    "     ```\n",
+    "     druid_host = \"http://localhost:8888\"\n";,
+    "     ```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b8a7510",
+   "metadata": {},
+   "source": [
+    "To start this tutorial, run the next cell. It imports the Python packages 
you'll need and defines a variable for the the Druid host, where the Router 
service listens.\n",
+    "\n",
+    "`druid_host` is the hostname and port for your Druid deployment. In a 
distributed environment, you can point to other Druid services. In this 
tutorial, you'll use the Router service as the `druid_host`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ed52d809",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In the Docker Compose tutorial environment, this is the Router\n",
+    "# service running at \"http://router:8888\".\n";,
+    "# If you are not using the Docker Compose environment, edit the 
`druid_host`.\n",
+    "\n",
+    "druid_host = \"http://host.docker.internal:8888\"\n";,
+    "druid_host"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f3c9a92",
+   "metadata": {},
+   "source": [
+    "Before we proceed with the tutorial, let's use the `/status/health` 
endpoint to verify that the cluster if up and running. This endpoint returns 
the Python value `true` if the Druid cluster has finished starting up and is 
running. Do not move on from this point if the following call does not return 
`true`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "18a8a495",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + '/status/health'\n",
+    "response = requests.request(\"GET\", endpoint)\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "19144be9",
+   "metadata": {},
+   "source": [
+    "In the rest of this tutorial, the `endpoint` and other variables are 
updated in code cells to call a different Druid endpoint to accomplish a task."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a281144",
+   "metadata": {},
+   "source": [
+    "## Ingest data\n",
+    "\n",
+    "Apache Druid stores data partitioned by time chunks into segments and 
supports deleting data by dropping segments. Before dropping data, we will use 
the quickstart Wikipedia data ingested with an indexing spec that creates 
hourly segments.\n",

Review Comment:
   ```suggestion
       "Apache Druid stores data partitioned by time chunks into segments and 
supports deleting data by dropping segments. To start, we will ingest the 
quickstart Wikipedia data and partition it by hour to create multiple segments 
.\n",
   ```



##########
examples/quickstart/jupyter-notebooks/notebooks/04-api/01-delete-api-tutorial.ipynb:
##########
@@ -0,0 +1,938 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "71bdcc40",
+   "metadata": {},
+   "source": [
+    "# Learn to delete data with Druid API\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "In working with data, Druid retains a copies of the existing data 
segments in deep storage and Historical processes. As new data is added into 
Druid, deep storage grows and becomes larger over time unless explicitly 
removed.\n",
+    "\n",
+    "While deep storage is an important part of Druid's elastic, 
fault-tolerant design, over time, data accumulation in deep storage can lead to 
increased storage costs. Periodically deleting data can reclaim storage space 
and promote optimal resource allocation.\n",
+    "\n",
+    "This notebook provides a tutorial on deleting existing data in Druid 
using the Coordinator API endpoints. \n",
+    "\n",
+    "## Table of contents\n",
+    "\n",
+    "- [Prerequisites](#Prerequisites)\n",
+    "- [Ingest data](#Ingest-data)\n",
+    "- [Deletion steps](#Deletion-steps)\n",
+    "- [Delete by time interval](#Delete-by-time-interval)\n",
+    "- [Delete entire table](#Delete-entire-table)\n",
+    "- [Delete by segment ID](#Delete-by-segment-ID)\n",
+    "\n",
+    "For the best experience, use JupyterLab so that you can always access the 
table of contents."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6fc260fc",
+   "metadata": {},
+   "source": [
+    "\n",
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid 26.0.0 or later.\n",
+    "\n",
+    "\n",
+    "Launch this tutorial and all prerequisites using the `druid-jupyter`, 
`kafka-jupyter`, or `all-services` profiles of the Docker Compose file for 
Jupyter-based Druid tutorials. For more information, see [Docker for Jupyter 
Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "\n",
+    "If you do not use the Docker Compose environment, you need the 
following:\n",
+    "\n",
+    "* A running Druid instance.<br>\n",
+    "     Update the `druid_host` variable to point to your Router endpoint. 
For example:\n",
+    "     ```\n",
+    "     druid_host = \"http://localhost:8888\"\n";,
+    "     ```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b8a7510",
+   "metadata": {},
+   "source": [
+    "To start this tutorial, run the next cell. It imports the Python packages 
you'll need and defines a variable for the the Druid host, where the Router 
service listens.\n",
+    "\n",
+    "`druid_host` is the hostname and port for your Druid deployment. In a 
distributed environment, you can point to other Druid services. In this 
tutorial, you'll use the Router service as the `druid_host`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ed52d809",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In the Docker Compose tutorial environment, this is the Router\n",
+    "# service running at \"http://router:8888\".\n";,
+    "# If you are not using the Docker Compose environment, edit the 
`druid_host`.\n",
+    "\n",
+    "druid_host = \"http://host.docker.internal:8888\"\n";,
+    "druid_host"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f3c9a92",
+   "metadata": {},
+   "source": [
+    "Before we proceed with the tutorial, let's use the `/status/health` 
endpoint to verify that the cluster if up and running. This endpoint returns 
the Python value `true` if the Druid cluster has finished starting up and is 
running. Do not move on from this point if the following call does not return 
`true`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "18a8a495",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + '/status/health'\n",
+    "response = requests.request(\"GET\", endpoint)\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "19144be9",
+   "metadata": {},
+   "source": [
+    "In the rest of this tutorial, the `endpoint` and other variables are 
updated in code cells to call a different Druid endpoint to accomplish a task."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a281144",
+   "metadata": {},
+   "source": [
+    "## Ingest data\n",
+    "\n",
+    "Apache Druid stores data partitioned by time chunks into segments and 
supports deleting data by dropping segments. Before dropping data, we will use 
the quickstart Wikipedia data ingested with an indexing spec that creates 
hourly segments.\n",
+    "\n",
+    "The following cell sets `endpoint` to `/druid/indexer/v1/task`. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "051655c9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + '/druid/indexer/v1/task'\n",
+    "endpoint"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02e4f551",
+   "metadata": {},
+   "source": [
+    "Next, construct a JSON payload with the ingestion specs to create a 
`wikipedia_hour` datasource with hour segmentation. There are many different 
[methods](https://druid.apache.org/docs/latest/ingestion/index.html#ingestion-methods)
 to ingest data, this tutorial uses [native batch 
ingestion](https://druid.apache.org/docs/latest/ingestion/native-batch.html) 
and the `/druid/indexer/v1/task` endpoint. For more information on construction 
an ingestion spec, see [ingestion spec 
reference](https://druid.apache.org/docs/latest/ingestion/ingestion-spec.html)."

Review Comment:
   Native batch is the legacy way to ingeest batch data. Use SQL-based 
ingestion/MSQ



##########
examples/quickstart/jupyter-notebooks/notebooks/04-api/01-delete-api-tutorial.ipynb:
##########
@@ -0,0 +1,938 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "71bdcc40",
+   "metadata": {},
+   "source": [
+    "# Learn to delete data with Druid API\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "In working with data, Druid retains a copies of the existing data 
segments in deep storage and Historical processes. As new data is added into 
Druid, deep storage grows and becomes larger over time unless explicitly 
removed.\n",
+    "\n",
+    "While deep storage is an important part of Druid's elastic, 
fault-tolerant design, over time, data accumulation in deep storage can lead to 
increased storage costs. Periodically deleting data can reclaim storage space 
and promote optimal resource allocation.\n",
+    "\n",
+    "This notebook provides a tutorial on deleting existing data in Druid 
using the Coordinator API endpoints. \n",
+    "\n",
+    "## Table of contents\n",
+    "\n",
+    "- [Prerequisites](#Prerequisites)\n",
+    "- [Ingest data](#Ingest-data)\n",
+    "- [Deletion steps](#Deletion-steps)\n",
+    "- [Delete by time interval](#Delete-by-time-interval)\n",
+    "- [Delete entire table](#Delete-entire-table)\n",
+    "- [Delete by segment ID](#Delete-by-segment-ID)\n",
+    "\n",
+    "For the best experience, use JupyterLab so that you can always access the 
table of contents."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6fc260fc",
+   "metadata": {},
+   "source": [
+    "\n",
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid 26.0.0 or later.\n",
+    "\n",
+    "\n",
+    "Launch this tutorial and all prerequisites using the `druid-jupyter`, 
`kafka-jupyter`, or `all-services` profiles of the Docker Compose file for 
Jupyter-based Druid tutorials. For more information, see [Docker for Jupyter 
Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "\n",
+    "If you do not use the Docker Compose environment, you need the 
following:\n",
+    "\n",
+    "* A running Druid instance.<br>\n",
+    "     Update the `druid_host` variable to point to your Router endpoint. 
For example:\n",
+    "     ```\n",
+    "     druid_host = \"http://localhost:8888\"\n";,
+    "     ```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b8a7510",
+   "metadata": {},
+   "source": [
+    "To start this tutorial, run the next cell. It imports the Python packages 
you'll need and defines a variable for the the Druid host, where the Router 
service listens.\n",
+    "\n",
+    "`druid_host` is the hostname and port for your Druid deployment. In a 
distributed environment, you can point to other Druid services. In this 
tutorial, you'll use the Router service as the `druid_host`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ed52d809",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In the Docker Compose tutorial environment, this is the Router\n",
+    "# service running at \"http://router:8888\".\n";,
+    "# If you are not using the Docker Compose environment, edit the 
`druid_host`.\n",
+    "\n",
+    "druid_host = \"http://host.docker.internal:8888\"\n";,
+    "druid_host"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f3c9a92",
+   "metadata": {},
+   "source": [
+    "Before we proceed with the tutorial, let's use the `/status/health` 
endpoint to verify that the cluster if up and running. This endpoint returns 
the Python value `true` if the Druid cluster has finished starting up and is 
running. Do not move on from this point if the following call does not return 
`true`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "18a8a495",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + '/status/health'\n",
+    "response = requests.request(\"GET\", endpoint)\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "19144be9",
+   "metadata": {},
+   "source": [
+    "In the rest of this tutorial, the `endpoint` and other variables are 
updated in code cells to call a different Druid endpoint to accomplish a task."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a281144",
+   "metadata": {},
+   "source": [
+    "## Ingest data\n",
+    "\n",
+    "Apache Druid stores data partitioned by time chunks into segments and 
supports deleting data by dropping segments. Before dropping data, we will use 
the quickstart Wikipedia data ingested with an indexing spec that creates 
hourly segments.\n",
+    "\n",
+    "The following cell sets `endpoint` to `/druid/indexer/v1/task`. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "051655c9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + '/druid/indexer/v1/task'\n",
+    "endpoint"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02e4f551",
+   "metadata": {},
+   "source": [
+    "Next, construct a JSON payload with the ingestion specs to create a 
`wikipedia_hour` datasource with hour segmentation. There are many different 
[methods](https://druid.apache.org/docs/latest/ingestion/index.html#ingestion-methods)
 to ingest data, this tutorial uses [native batch 
ingestion](https://druid.apache.org/docs/latest/ingestion/native-batch.html) 
and the `/druid/indexer/v1/task` endpoint. For more information on construction 
an ingestion spec, see [ingestion spec 
reference](https://druid.apache.org/docs/latest/ingestion/ingestion-spec.html)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9ff9d098",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "payload = json.dumps({\n",
+    "  \"type\": \"index_parallel\",\n",
+    "  \"spec\": {\n",
+    "    \"dataSchema\": {\n",
+    "      \"dataSource\": \"wikipedia_hour\",\n",
+    "      \"timestampSpec\": {\n",
+    "        \"column\": \"time\",\n",
+    "        \"format\": \"iso\"\n",
+    "      },\n",
+    "      \"dimensionsSpec\": {\n",
+    "        \"useSchemaDiscovery\": True\n",
+    "      },\n",
+    "      \"metricsSpec\": [],\n",
+    "      \"granularitySpec\": {\n",
+    "        \"type\": \"uniform\",\n",
+    "        \"segmentGranularity\": \"hour\",\n",
+    "        \"queryGranularity\": \"none\",\n",
+    "        \"intervals\": [\n",
+    "          \"2015-09-12/2015-09-13\"\n",
+    "        ],\n",
+    "        \"rollup\": False\n",
+    "      }\n",
+    "    },\n",
+    "    \"ioConfig\": {\n",
+    "      \"type\": \"index_parallel\",\n",
+    "      \"inputSource\": {\n",
+    "        \"type\": \"local\",\n",
+    "        \"baseDir\": \"quickstart/tutorial/\",\n",
+    "        \"filter\": \"wikiticker-2015-09-12-sampled.json.gz\"\n",
+    "      },\n",
+    "      \"inputFormat\": {\n",
+    "        \"type\": \"json\"\n",
+    "      },\n",
+    "      \"appendToExisting\": False\n",
+    "    },\n",
+    "    \"tuningConfig\": {\n",
+    "      \"type\": \"index_parallel\",\n",
+    "      \"maxRowsPerSegment\": 5000000,\n",
+    "      \"maxRowsInMemory\": 25000\n",
+    "    }\n",
+    "  }\n",
+    "})\n",
+    "\n",
+    "headers = {\n",
+    "  'Content-Type': 'application/json'\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1cf78bb7",
+   "metadata": {},
+   "source": [
+    "With the payload and headers ready, run the next cell to send a `POST` 
request to the endpoint."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "543b03ee",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = requests.request(\"POST\", endpoint, headers=headers, 
data=payload)\n",
+    "                            \n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cab33e7e",
+   "metadata": {},
+   "source": [
+    "Once the data has been ingested, Druid will be populated with segments 
for each segment interval that contains data. Since the `wikipedia_hour` was 
ingested with `HOUR` granularity, there will be 24 segments associated with 
`wikipedia_hour`. \n",
+    "\n",
+    "For demonstration, let's view the segments generated for the 
`wikipedia_hour` datasource before any deletion is made. Run the following cell 
to set the endpoint to `/druid/v2/sql/`. For more information on this endpoint, 
see [Druid SQL 
API](https://druid.apache.org/docs/latest/querying/sql-api.html).\n",
+    "\n",
+    "Using this endpoint, you can query the `sys` [metadata 
table](https://druid.apache.org/docs/latest/querying/sql-metadata-tables.html#system-schema)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "956abeee",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + '/druid/v2/sql'\n",
+    "endpoint"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "701550dd",
+   "metadata": {},
+   "source": [
+    "Now, you can query the metadata table to retrieve segment information. 
The following cell sends a SQL query to retrieve `segment_id` information for 
the `wikipedia_hour` datasource. This tutorial sets the `resultFormat` to 
`objectLines`. This helps format the response with newlines and makes it easier 
to parse the output."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bb54a6b7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "payload = json.dumps({\n",
+    "  \"query\": \"SELECT segment_id FROM sys.segments WHERE 
\\\"datasource\\\" = 'wikipedia_hour'\",\n",
+    "  \"resultFormat\": \"objectLines\"\n",
+    "})\n",
+    "headers = {\n",
+    "  'Content-Type': 'application/json'\n",
+    "}\n",
+    " \n",
+    "response = requests.request(\"POST\", endpoint, headers=headers, 
data=payload)\n",
+    "\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f06e24e5",
+   "metadata": {},
+   "source": [
+    "Observe the response retrieved from the previous cell. In total, there 
are 24 `segment_id`, each containing the datasource name `wikipedia_hour`, 
along with the start and end hour interval. The tail end of the ID also 
contains the timestamp of when the request was made. \n",
+    "\n",
+    "For this tutorial, we are concerned with observing the start and end 
interval for each `segment_id`. \n",
+    "\n",
+    "For example: \n",
+    
"`{\"segment_id\":\"wikipedia_hour_2015-09-12T00:00:00.000Z_2015-09-12T01:00:00.000Z_2023-08-07T21:36:29.244Z\"}`
 indicates this segment contains data from `2015-09-12T00:00:00.000` to 
`2015-09-12T01:00:00.000Z`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ca79f5f9",
+   "metadata": {},
+   "source": [
+    "## Deletion steps"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6cd1c8c",
+   "metadata": {},
+   "source": [
+    "Permanent deletion of a segment in Apache Druid has two steps:\n",
+    "\n",
+    "1. A segment is marked as \"unused.\" This step occurs when a segment is 
dropped by a [drop 
rule](https://druid.apache.org/docs/latest/operations/rule-configuration.html#set-retention-rules)
 or manually marked as \"unused\" through the Coordinator API or web console. 
Note that marking a segment as \"unused\" is a soft delete, it is no longer 
available for querying but the segment files remain in deep storage and segment 
records remain in the metadata store. \n",
+    "2. A kill task is sent to permanently remove \"unused\" segments. This 
deletes the segment file from deep storage and removes its record from the 
metadata store. This is a hard delete: the data is unrecoverable unless you 
have a backup."

Review Comment:
   Convert these to active voice from passive



##########
examples/quickstart/jupyter-notebooks/notebooks/04-api/01-delete-api-tutorial.ipynb:
##########
@@ -0,0 +1,938 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "71bdcc40",
+   "metadata": {},
+   "source": [
+    "# Learn to delete data with Druid API\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "In working with data, Druid retains a copies of the existing data 
segments in deep storage and Historical processes. As new data is added into 
Druid, deep storage grows and becomes larger over time unless explicitly 
removed.\n",
+    "\n",
+    "While deep storage is an important part of Druid's elastic, 
fault-tolerant design, over time, data accumulation in deep storage can lead to 
increased storage costs. Periodically deleting data can reclaim storage space 
and promote optimal resource allocation.\n",
+    "\n",
+    "This notebook provides a tutorial on deleting existing data in Druid 
using the Coordinator API endpoints. \n",
+    "\n",
+    "## Table of contents\n",
+    "\n",
+    "- [Prerequisites](#Prerequisites)\n",
+    "- [Ingest data](#Ingest-data)\n",
+    "- [Deletion steps](#Deletion-steps)\n",
+    "- [Delete by time interval](#Delete-by-time-interval)\n",
+    "- [Delete entire table](#Delete-entire-table)\n",
+    "- [Delete by segment ID](#Delete-by-segment-ID)\n",
+    "\n",
+    "For the best experience, use JupyterLab so that you can always access the 
table of contents."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6fc260fc",
+   "metadata": {},
+   "source": [
+    "\n",
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid 26.0.0 or later.\n",
+    "\n",
+    "\n",
+    "Launch this tutorial and all prerequisites using the `druid-jupyter`, 
`kafka-jupyter`, or `all-services` profiles of the Docker Compose file for 
Jupyter-based Druid tutorials. For more information, see [Docker for Jupyter 
Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "\n",
+    "If you do not use the Docker Compose environment, you need the 
following:\n",
+    "\n",
+    "* A running Druid instance.<br>\n",
+    "     Update the `druid_host` variable to point to your Router endpoint. 
For example:\n",
+    "     ```\n",
+    "     druid_host = \"http://localhost:8888\"\n";,
+    "     ```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b8a7510",
+   "metadata": {},
+   "source": [
+    "To start this tutorial, run the next cell. It imports the Python packages 
you'll need and defines a variable for the the Druid host, where the Router 
service listens.\n",
+    "\n",
+    "`druid_host` is the hostname and port for your Druid deployment. In a 
distributed environment, you can point to other Druid services. In this 
tutorial, you'll use the Router service as the `druid_host`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ed52d809",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In the Docker Compose tutorial environment, this is the Router\n",
+    "# service running at \"http://router:8888\".\n";,
+    "# If you are not using the Docker Compose environment, edit the 
`druid_host`.\n",
+    "\n",
+    "druid_host = \"http://host.docker.internal:8888\"\n";,
+    "druid_host"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f3c9a92",
+   "metadata": {},
+   "source": [
+    "Before we proceed with the tutorial, let's use the `/status/health` 
endpoint to verify that the cluster if up and running. This endpoint returns 
the Python value `true` if the Druid cluster has finished starting up and is 
running. Do not move on from this point if the following call does not return 
`true`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "18a8a495",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + '/status/health'\n",
+    "response = requests.request(\"GET\", endpoint)\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "19144be9",
+   "metadata": {},
+   "source": [
+    "In the rest of this tutorial, the `endpoint` and other variables are 
updated in code cells to call a different Druid endpoint to accomplish a task."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a281144",
+   "metadata": {},
+   "source": [
+    "## Ingest data\n",
+    "\n",
+    "Apache Druid stores data partitioned by time chunks into segments and 
supports deleting data by dropping segments. Before dropping data, we will use 
the quickstart Wikipedia data ingested with an indexing spec that creates 
hourly segments.\n",
+    "\n",
+    "The following cell sets `endpoint` to `/druid/indexer/v1/task`. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "051655c9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + '/druid/indexer/v1/task'\n",
+    "endpoint"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02e4f551",
+   "metadata": {},
+   "source": [
+    "Next, construct a JSON payload with the ingestion specs to create a 
`wikipedia_hour` datasource with hour segmentation. There are many different 
[methods](https://druid.apache.org/docs/latest/ingestion/index.html#ingestion-methods)
 to ingest data, this tutorial uses [native batch 
ingestion](https://druid.apache.org/docs/latest/ingestion/native-batch.html) 
and the `/druid/indexer/v1/task` endpoint. For more information on construction 
an ingestion spec, see [ingestion spec 
reference](https://druid.apache.org/docs/latest/ingestion/ingestion-spec.html)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9ff9d098",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "payload = json.dumps({\n",
+    "  \"type\": \"index_parallel\",\n",
+    "  \"spec\": {\n",
+    "    \"dataSchema\": {\n",
+    "      \"dataSource\": \"wikipedia_hour\",\n",
+    "      \"timestampSpec\": {\n",
+    "        \"column\": \"time\",\n",
+    "        \"format\": \"iso\"\n",
+    "      },\n",
+    "      \"dimensionsSpec\": {\n",
+    "        \"useSchemaDiscovery\": True\n",
+    "      },\n",
+    "      \"metricsSpec\": [],\n",
+    "      \"granularitySpec\": {\n",
+    "        \"type\": \"uniform\",\n",
+    "        \"segmentGranularity\": \"hour\",\n",
+    "        \"queryGranularity\": \"none\",\n",
+    "        \"intervals\": [\n",
+    "          \"2015-09-12/2015-09-13\"\n",
+    "        ],\n",
+    "        \"rollup\": False\n",
+    "      }\n",
+    "    },\n",
+    "    \"ioConfig\": {\n",
+    "      \"type\": \"index_parallel\",\n",
+    "      \"inputSource\": {\n",
+    "        \"type\": \"local\",\n",
+    "        \"baseDir\": \"quickstart/tutorial/\",\n",
+    "        \"filter\": \"wikiticker-2015-09-12-sampled.json.gz\"\n",
+    "      },\n",
+    "      \"inputFormat\": {\n",
+    "        \"type\": \"json\"\n",
+    "      },\n",
+    "      \"appendToExisting\": False\n",
+    "    },\n",
+    "    \"tuningConfig\": {\n",
+    "      \"type\": \"index_parallel\",\n",
+    "      \"maxRowsPerSegment\": 5000000,\n",
+    "      \"maxRowsInMemory\": 25000\n",
+    "    }\n",
+    "  }\n",
+    "})\n",
+    "\n",
+    "headers = {\n",
+    "  'Content-Type': 'application/json'\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1cf78bb7",
+   "metadata": {},
+   "source": [
+    "With the payload and headers ready, run the next cell to send a `POST` 
request to the endpoint."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "543b03ee",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = requests.request(\"POST\", endpoint, headers=headers, 
data=payload)\n",
+    "                            \n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cab33e7e",
+   "metadata": {},
+   "source": [
+    "Once the data has been ingested, Druid will be populated with segments 
for each segment interval that contains data. Since the `wikipedia_hour` was 
ingested with `HOUR` granularity, there will be 24 segments associated with 
`wikipedia_hour`. \n",
+    "\n",
+    "For demonstration, let's view the segments generated for the 
`wikipedia_hour` datasource before any deletion is made. Run the following cell 
to set the endpoint to `/druid/v2/sql/`. For more information on this endpoint, 
see [Druid SQL 
API](https://druid.apache.org/docs/latest/querying/sql-api.html).\n",
+    "\n",
+    "Using this endpoint, you can query the `sys` [metadata 
table](https://druid.apache.org/docs/latest/querying/sql-metadata-tables.html#system-schema)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "956abeee",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + '/druid/v2/sql'\n",
+    "endpoint"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "701550dd",
+   "metadata": {},
+   "source": [
+    "Now, you can query the metadata table to retrieve segment information. 
The following cell sends a SQL query to retrieve `segment_id` information for 
the `wikipedia_hour` datasource. This tutorial sets the `resultFormat` to 
`objectLines`. This helps format the response with newlines and makes it easier 
to parse the output."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bb54a6b7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "payload = json.dumps({\n",
+    "  \"query\": \"SELECT segment_id FROM sys.segments WHERE 
\\\"datasource\\\" = 'wikipedia_hour'\",\n",
+    "  \"resultFormat\": \"objectLines\"\n",
+    "})\n",
+    "headers = {\n",
+    "  'Content-Type': 'application/json'\n",
+    "}\n",
+    " \n",
+    "response = requests.request(\"POST\", endpoint, headers=headers, 
data=payload)\n",
+    "\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f06e24e5",
+   "metadata": {},
+   "source": [
+    "Observe the response retrieved from the previous cell. In total, there 
are 24 `segment_id`, each containing the datasource name `wikipedia_hour`, 
along with the start and end hour interval. The tail end of the ID also 
contains the timestamp of when the request was made. \n",
+    "\n",
+    "For this tutorial, we are concerned with observing the start and end 
interval for each `segment_id`. \n",
+    "\n",
+    "For example: \n",
+    
"`{\"segment_id\":\"wikipedia_hour_2015-09-12T00:00:00.000Z_2015-09-12T01:00:00.000Z_2023-08-07T21:36:29.244Z\"}`
 indicates this segment contains data from `2015-09-12T00:00:00.000` to 
`2015-09-12T01:00:00.000Z`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ca79f5f9",
+   "metadata": {},
+   "source": [
+    "## Deletion steps"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6cd1c8c",
+   "metadata": {},
+   "source": [
+    "Permanent deletion of a segment in Apache Druid has two steps:\n",

Review Comment:
   ```suggestion
       "Permanent deletion of a segment in Druid has two steps:\n",
   ```



##########
examples/quickstart/jupyter-notebooks/notebooks/04-api/01-delete-api-tutorial.ipynb:
##########
@@ -0,0 +1,938 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "71bdcc40",
+   "metadata": {},
+   "source": [
+    "# Learn to delete data with Druid API\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "In working with data, Druid retains a copies of the existing data 
segments in deep storage and Historical processes. As new data is added into 
Druid, deep storage grows and becomes larger over time unless explicitly 
removed.\n",
+    "\n",
+    "While deep storage is an important part of Druid's elastic, 
fault-tolerant design, over time, data accumulation in deep storage can lead to 
increased storage costs. Periodically deleting data can reclaim storage space 
and promote optimal resource allocation.\n",
+    "\n",
+    "This notebook provides a tutorial on deleting existing data in Druid 
using the Coordinator API endpoints. \n",
+    "\n",
+    "## Table of contents\n",
+    "\n",
+    "- [Prerequisites](#Prerequisites)\n",
+    "- [Ingest data](#Ingest-data)\n",
+    "- [Deletion steps](#Deletion-steps)\n",
+    "- [Delete by time interval](#Delete-by-time-interval)\n",
+    "- [Delete entire table](#Delete-entire-table)\n",
+    "- [Delete by segment ID](#Delete-by-segment-ID)\n",
+    "\n",
+    "For the best experience, use JupyterLab so that you can always access the 
table of contents."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6fc260fc",
+   "metadata": {},
+   "source": [
+    "\n",
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid 26.0.0 or later.\n",
+    "\n",
+    "\n",
+    "Launch this tutorial and all prerequisites using the `druid-jupyter`, 
`kafka-jupyter`, or `all-services` profiles of the Docker Compose file for 
Jupyter-based Druid tutorials. For more information, see [Docker for Jupyter 
Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "\n",
+    "If you do not use the Docker Compose environment, you need the 
following:\n",
+    "\n",
+    "* A running Druid instance.<br>\n",
+    "     Update the `druid_host` variable to point to your Router endpoint. 
For example:\n",
+    "     ```\n",
+    "     druid_host = \"http://localhost:8888\"\n";,
+    "     ```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b8a7510",
+   "metadata": {},
+   "source": [
+    "To start this tutorial, run the next cell. It imports the Python packages 
you'll need and defines a variable for the the Druid host, where the Router 
service listens.\n",
+    "\n",
+    "`druid_host` is the hostname and port for your Druid deployment. In a 
distributed environment, you can point to other Druid services. In this 
tutorial, you'll use the Router service as the `druid_host`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ed52d809",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In the Docker Compose tutorial environment, this is the Router\n",
+    "# service running at \"http://router:8888\".\n";,
+    "# If you are not using the Docker Compose environment, edit the 
`druid_host`.\n",
+    "\n",
+    "druid_host = \"http://host.docker.internal:8888\"\n";,
+    "druid_host"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f3c9a92",
+   "metadata": {},
+   "source": [
+    "Before we proceed with the tutorial, let's use the `/status/health` 
endpoint to verify that the cluster if up and running. This endpoint returns 
the Python value `true` if the Druid cluster has finished starting up and is 
running. Do not move on from this point if the following call does not return 
`true`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "18a8a495",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + '/status/health'\n",
+    "response = requests.request(\"GET\", endpoint)\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "19144be9",
+   "metadata": {},
+   "source": [
+    "In the rest of this tutorial, the `endpoint` and other variables are 
updated in code cells to call a different Druid endpoint to accomplish a task."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a281144",
+   "metadata": {},
+   "source": [
+    "## Ingest data\n",
+    "\n",
+    "Apache Druid stores data partitioned by time chunks into segments and 
supports deleting data by dropping segments. Before dropping data, we will use 
the quickstart Wikipedia data ingested with an indexing spec that creates 
hourly segments.\n",
+    "\n",
+    "The following cell sets `endpoint` to `/druid/indexer/v1/task`. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "051655c9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + '/druid/indexer/v1/task'\n",
+    "endpoint"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02e4f551",
+   "metadata": {},
+   "source": [
+    "Next, construct a JSON payload with the ingestion specs to create a 
`wikipedia_hour` datasource with hour segmentation. There are many different 
[methods](https://druid.apache.org/docs/latest/ingestion/index.html#ingestion-methods)
 to ingest data, this tutorial uses [native batch 
ingestion](https://druid.apache.org/docs/latest/ingestion/native-batch.html) 
and the `/druid/indexer/v1/task` endpoint. For more information on construction 
an ingestion spec, see [ingestion spec 
reference](https://druid.apache.org/docs/latest/ingestion/ingestion-spec.html)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9ff9d098",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "payload = json.dumps({\n",
+    "  \"type\": \"index_parallel\",\n",
+    "  \"spec\": {\n",
+    "    \"dataSchema\": {\n",
+    "      \"dataSource\": \"wikipedia_hour\",\n",
+    "      \"timestampSpec\": {\n",
+    "        \"column\": \"time\",\n",
+    "        \"format\": \"iso\"\n",
+    "      },\n",
+    "      \"dimensionsSpec\": {\n",
+    "        \"useSchemaDiscovery\": True\n",
+    "      },\n",
+    "      \"metricsSpec\": [],\n",
+    "      \"granularitySpec\": {\n",
+    "        \"type\": \"uniform\",\n",
+    "        \"segmentGranularity\": \"hour\",\n",
+    "        \"queryGranularity\": \"none\",\n",
+    "        \"intervals\": [\n",
+    "          \"2015-09-12/2015-09-13\"\n",
+    "        ],\n",
+    "        \"rollup\": False\n",
+    "      }\n",
+    "    },\n",
+    "    \"ioConfig\": {\n",
+    "      \"type\": \"index_parallel\",\n",
+    "      \"inputSource\": {\n",
+    "        \"type\": \"local\",\n",
+    "        \"baseDir\": \"quickstart/tutorial/\",\n",
+    "        \"filter\": \"wikiticker-2015-09-12-sampled.json.gz\"\n",
+    "      },\n",
+    "      \"inputFormat\": {\n",
+    "        \"type\": \"json\"\n",
+    "      },\n",
+    "      \"appendToExisting\": False\n",
+    "    },\n",
+    "    \"tuningConfig\": {\n",
+    "      \"type\": \"index_parallel\",\n",
+    "      \"maxRowsPerSegment\": 5000000,\n",
+    "      \"maxRowsInMemory\": 25000\n",
+    "    }\n",
+    "  }\n",
+    "})\n",
+    "\n",
+    "headers = {\n",
+    "  'Content-Type': 'application/json'\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1cf78bb7",
+   "metadata": {},
+   "source": [
+    "With the payload and headers ready, run the next cell to send a `POST` 
request to the endpoint."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "543b03ee",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = requests.request(\"POST\", endpoint, headers=headers, 
data=payload)\n",
+    "                            \n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cab33e7e",
+   "metadata": {},
+   "source": [
+    "Once the data has been ingested, Druid will be populated with segments 
for each segment interval that contains data. Since the `wikipedia_hour` was 
ingested with `HOUR` granularity, there will be 24 segments associated with 
`wikipedia_hour`. \n",

Review Comment:
   This isn't necessarily always true. If I set the max rows per segment to 1, 
you'd get way more than 24 total segments. 



##########
examples/quickstart/jupyter-notebooks/notebooks/04-api/01-delete-api-tutorial.ipynb:
##########
@@ -0,0 +1,938 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "71bdcc40",
+   "metadata": {},
+   "source": [
+    "# Learn to delete data with Druid API\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "In working with data, Druid retains a copies of the existing data 
segments in deep storage and Historical processes. As new data is added into 
Druid, deep storage grows and becomes larger over time unless explicitly 
removed.\n",
+    "\n",
+    "While deep storage is an important part of Druid's elastic, 
fault-tolerant design, over time, data accumulation in deep storage can lead to 
increased storage costs. Periodically deleting data can reclaim storage space 
and promote optimal resource allocation.\n",
+    "\n",
+    "This notebook provides a tutorial on deleting existing data in Druid 
using the Coordinator API endpoints. \n",
+    "\n",
+    "## Table of contents\n",
+    "\n",
+    "- [Prerequisites](#Prerequisites)\n",
+    "- [Ingest data](#Ingest-data)\n",
+    "- [Deletion steps](#Deletion-steps)\n",
+    "- [Delete by time interval](#Delete-by-time-interval)\n",
+    "- [Delete entire table](#Delete-entire-table)\n",
+    "- [Delete by segment ID](#Delete-by-segment-ID)\n",
+    "\n",
+    "For the best experience, use JupyterLab so that you can always access the 
table of contents."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6fc260fc",
+   "metadata": {},
+   "source": [
+    "\n",
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid 26.0.0 or later.\n",
+    "\n",
+    "\n",
+    "Launch this tutorial and all prerequisites using the `druid-jupyter`, 
`kafka-jupyter`, or `all-services` profiles of the Docker Compose file for 
Jupyter-based Druid tutorials. For more information, see [Docker for Jupyter 
Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "\n",
+    "If you do not use the Docker Compose environment, you need the 
following:\n",
+    "\n",
+    "* A running Druid instance.<br>\n",
+    "     Update the `druid_host` variable to point to your Router endpoint. 
For example:\n",
+    "     ```\n",
+    "     druid_host = \"http://localhost:8888\"\n";,
+    "     ```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b8a7510",
+   "metadata": {},
+   "source": [
+    "To start this tutorial, run the next cell. It imports the Python packages 
you'll need and defines a variable for the the Druid host, where the Router 
service listens.\n",
+    "\n",
+    "`druid_host` is the hostname and port for your Druid deployment. In a 
distributed environment, you can point to other Druid services. In this 
tutorial, you'll use the Router service as the `druid_host`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ed52d809",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In the Docker Compose tutorial environment, this is the Router\n",
+    "# service running at \"http://router:8888\".\n";,
+    "# If you are not using the Docker Compose environment, edit the 
`druid_host`.\n",
+    "\n",
+    "druid_host = \"http://host.docker.internal:8888\"\n";,
+    "druid_host"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f3c9a92",
+   "metadata": {},
+   "source": [
+    "Before we proceed with the tutorial, let's use the `/status/health` 
endpoint to verify that the cluster if up and running. This endpoint returns 
the Python value `true` if the Druid cluster has finished starting up and is 
running. Do not move on from this point if the following call does not return 
`true`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "18a8a495",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + '/status/health'\n",
+    "response = requests.request(\"GET\", endpoint)\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "19144be9",
+   "metadata": {},
+   "source": [
+    "In the rest of this tutorial, the `endpoint` and other variables are 
updated in code cells to call a different Druid endpoint to accomplish a task."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a281144",
+   "metadata": {},
+   "source": [
+    "## Ingest data\n",
+    "\n",
+    "Apache Druid stores data partitioned by time chunks into segments and 
supports deleting data by dropping segments. Before dropping data, we will use 
the quickstart Wikipedia data ingested with an indexing spec that creates 
hourly segments.\n",
+    "\n",
+    "The following cell sets `endpoint` to `/druid/indexer/v1/task`. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "051655c9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + '/druid/indexer/v1/task'\n",
+    "endpoint"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02e4f551",
+   "metadata": {},
+   "source": [
+    "Next, construct a JSON payload with the ingestion specs to create a 
`wikipedia_hour` datasource with hour segmentation. There are many different 
[methods](https://druid.apache.org/docs/latest/ingestion/index.html#ingestion-methods)
 to ingest data, this tutorial uses [native batch 
ingestion](https://druid.apache.org/docs/latest/ingestion/native-batch.html) 
and the `/druid/indexer/v1/task` endpoint. For more information on construction 
an ingestion spec, see [ingestion spec 
reference](https://druid.apache.org/docs/latest/ingestion/ingestion-spec.html)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9ff9d098",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "payload = json.dumps({\n",
+    "  \"type\": \"index_parallel\",\n",
+    "  \"spec\": {\n",
+    "    \"dataSchema\": {\n",
+    "      \"dataSource\": \"wikipedia_hour\",\n",
+    "      \"timestampSpec\": {\n",
+    "        \"column\": \"time\",\n",
+    "        \"format\": \"iso\"\n",
+    "      },\n",
+    "      \"dimensionsSpec\": {\n",
+    "        \"useSchemaDiscovery\": True\n",
+    "      },\n",
+    "      \"metricsSpec\": [],\n",
+    "      \"granularitySpec\": {\n",
+    "        \"type\": \"uniform\",\n",
+    "        \"segmentGranularity\": \"hour\",\n",
+    "        \"queryGranularity\": \"none\",\n",
+    "        \"intervals\": [\n",
+    "          \"2015-09-12/2015-09-13\"\n",
+    "        ],\n",
+    "        \"rollup\": False\n",
+    "      }\n",
+    "    },\n",
+    "    \"ioConfig\": {\n",
+    "      \"type\": \"index_parallel\",\n",
+    "      \"inputSource\": {\n",
+    "        \"type\": \"local\",\n",
+    "        \"baseDir\": \"quickstart/tutorial/\",\n",
+    "        \"filter\": \"wikiticker-2015-09-12-sampled.json.gz\"\n",
+    "      },\n",
+    "      \"inputFormat\": {\n",
+    "        \"type\": \"json\"\n",
+    "      },\n",
+    "      \"appendToExisting\": False\n",
+    "    },\n",
+    "    \"tuningConfig\": {\n",
+    "      \"type\": \"index_parallel\",\n",
+    "      \"maxRowsPerSegment\": 5000000,\n",
+    "      \"maxRowsInMemory\": 25000\n",
+    "    }\n",
+    "  }\n",
+    "})\n",
+    "\n",
+    "headers = {\n",
+    "  'Content-Type': 'application/json'\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1cf78bb7",
+   "metadata": {},
+   "source": [
+    "With the payload and headers ready, run the next cell to send a `POST` 
request to the endpoint."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "543b03ee",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = requests.request(\"POST\", endpoint, headers=headers, 
data=payload)\n",
+    "                            \n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cab33e7e",
+   "metadata": {},
+   "source": [
+    "Once the data has been ingested, Druid will be populated with segments 
for each segment interval that contains data. Since the `wikipedia_hour` was 
ingested with `HOUR` granularity, there will be 24 segments associated with 
`wikipedia_hour`. \n",
+    "\n",
+    "For demonstration, let's view the segments generated for the 
`wikipedia_hour` datasource before any deletion is made. Run the following cell 
to set the endpoint to `/druid/v2/sql/`. For more information on this endpoint, 
see [Druid SQL 
API](https://druid.apache.org/docs/latest/querying/sql-api.html).\n",
+    "\n",
+    "Using this endpoint, you can query the `sys` [metadata 
table](https://druid.apache.org/docs/latest/querying/sql-metadata-tables.html#system-schema)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "956abeee",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + '/druid/v2/sql'\n",
+    "endpoint"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "701550dd",
+   "metadata": {},
+   "source": [
+    "Now, you can query the metadata table to retrieve segment information. 
The following cell sends a SQL query to retrieve `segment_id` information for 
the `wikipedia_hour` datasource. This tutorial sets the `resultFormat` to 
`objectLines`. This helps format the response with newlines and makes it easier 
to parse the output."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bb54a6b7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "payload = json.dumps({\n",
+    "  \"query\": \"SELECT segment_id FROM sys.segments WHERE 
\\\"datasource\\\" = 'wikipedia_hour'\",\n",
+    "  \"resultFormat\": \"objectLines\"\n",
+    "})\n",
+    "headers = {\n",
+    "  'Content-Type': 'application/json'\n",
+    "}\n",
+    " \n",
+    "response = requests.request(\"POST\", endpoint, headers=headers, 
data=payload)\n",
+    "\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f06e24e5",
+   "metadata": {},
+   "source": [
+    "Observe the response retrieved from the previous cell. In total, there 
are 24 `segment_id`, each containing the datasource name `wikipedia_hour`, 
along with the start and end hour interval. The tail end of the ID also 
contains the timestamp of when the request was made. \n",

Review Comment:
   ```suggestion
       "Observe the response retrieved from the previous cell. In total, there 
are 24 `segment_id` records, each containing the datasource name 
`wikipedia_hour`, along with the start and end hour interval. The tail end of 
the ID also contains the timestamp of when the request was made. \n",
   ```



##########
examples/quickstart/jupyter-notebooks/notebooks/04-api/01-delete-api-tutorial.ipynb:
##########
@@ -0,0 +1,938 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "71bdcc40",
+   "metadata": {},
+   "source": [
+    "# Learn to delete data with Druid API\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "In working with data, Druid retains a copies of the existing data 
segments in deep storage and Historical processes. As new data is added into 
Druid, deep storage grows and becomes larger over time unless explicitly 
removed.\n",
+    "\n",
+    "While deep storage is an important part of Druid's elastic, 
fault-tolerant design, over time, data accumulation in deep storage can lead to 
increased storage costs. Periodically deleting data can reclaim storage space 
and promote optimal resource allocation.\n",
+    "\n",
+    "This notebook provides a tutorial on deleting existing data in Druid 
using the Coordinator API endpoints. \n",
+    "\n",
+    "## Table of contents\n",
+    "\n",
+    "- [Prerequisites](#Prerequisites)\n",
+    "- [Ingest data](#Ingest-data)\n",
+    "- [Deletion steps](#Deletion-steps)\n",
+    "- [Delete by time interval](#Delete-by-time-interval)\n",
+    "- [Delete entire table](#Delete-entire-table)\n",
+    "- [Delete by segment ID](#Delete-by-segment-ID)\n",
+    "\n",
+    "For the best experience, use JupyterLab so that you can always access the 
table of contents."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6fc260fc",
+   "metadata": {},
+   "source": [
+    "\n",
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid 26.0.0 or later.\n",
+    "\n",
+    "\n",
+    "Launch this tutorial and all prerequisites using the `druid-jupyter`, 
`kafka-jupyter`, or `all-services` profiles of the Docker Compose file for 
Jupyter-based Druid tutorials. For more information, see [Docker for Jupyter 
Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "\n",
+    "If you do not use the Docker Compose environment, you need the 
following:\n",
+    "\n",
+    "* A running Druid instance.<br>\n",
+    "     Update the `druid_host` variable to point to your Router endpoint. 
For example:\n",
+    "     ```\n",
+    "     druid_host = \"http://localhost:8888\"\n";,
+    "     ```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b8a7510",
+   "metadata": {},
+   "source": [
+    "To start this tutorial, run the next cell. It imports the Python packages 
you'll need and defines a variable for the the Druid host, where the Router 
service listens.\n",
+    "\n",
+    "`druid_host` is the hostname and port for your Druid deployment. In a 
distributed environment, you can point to other Druid services. In this 
tutorial, you'll use the Router service as the `druid_host`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ed52d809",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In the Docker Compose tutorial environment, this is the Router\n",
+    "# service running at \"http://router:8888\".\n";,
+    "# If you are not using the Docker Compose environment, edit the 
`druid_host`.\n",
+    "\n",
+    "druid_host = \"http://host.docker.internal:8888\"\n";,
+    "druid_host"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f3c9a92",
+   "metadata": {},
+   "source": [
+    "Before we proceed with the tutorial, let's use the `/status/health` 
endpoint to verify that the cluster if up and running. This endpoint returns 
the Python value `true` if the Druid cluster has finished starting up and is 
running. Do not move on from this point if the following call does not return 
`true`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "18a8a495",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + '/status/health'\n",
+    "response = requests.request(\"GET\", endpoint)\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "19144be9",
+   "metadata": {},
+   "source": [
+    "In the rest of this tutorial, the `endpoint` and other variables are 
updated in code cells to call a different Druid endpoint to accomplish a task."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a281144",
+   "metadata": {},
+   "source": [
+    "## Ingest data\n",
+    "\n",
+    "Apache Druid stores data partitioned by time chunks into segments and 
supports deleting data by dropping segments. Before dropping data, we will use 
the quickstart Wikipedia data ingested with an indexing spec that creates 
hourly segments.\n",
+    "\n",
+    "The following cell sets `endpoint` to `/druid/indexer/v1/task`. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "051655c9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + '/druid/indexer/v1/task'\n",
+    "endpoint"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02e4f551",
+   "metadata": {},
+   "source": [
+    "Next, construct a JSON payload with the ingestion specs to create a 
`wikipedia_hour` datasource with hour segmentation. There are many different 
[methods](https://druid.apache.org/docs/latest/ingestion/index.html#ingestion-methods)
 to ingest data, this tutorial uses [native batch 
ingestion](https://druid.apache.org/docs/latest/ingestion/native-batch.html) 
and the `/druid/indexer/v1/task` endpoint. For more information on construction 
an ingestion spec, see [ingestion spec 
reference](https://druid.apache.org/docs/latest/ingestion/ingestion-spec.html)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9ff9d098",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "payload = json.dumps({\n",
+    "  \"type\": \"index_parallel\",\n",
+    "  \"spec\": {\n",
+    "    \"dataSchema\": {\n",
+    "      \"dataSource\": \"wikipedia_hour\",\n",
+    "      \"timestampSpec\": {\n",
+    "        \"column\": \"time\",\n",
+    "        \"format\": \"iso\"\n",
+    "      },\n",
+    "      \"dimensionsSpec\": {\n",
+    "        \"useSchemaDiscovery\": True\n",
+    "      },\n",
+    "      \"metricsSpec\": [],\n",
+    "      \"granularitySpec\": {\n",
+    "        \"type\": \"uniform\",\n",
+    "        \"segmentGranularity\": \"hour\",\n",
+    "        \"queryGranularity\": \"none\",\n",
+    "        \"intervals\": [\n",
+    "          \"2015-09-12/2015-09-13\"\n",
+    "        ],\n",
+    "        \"rollup\": False\n",
+    "      }\n",
+    "    },\n",
+    "    \"ioConfig\": {\n",
+    "      \"type\": \"index_parallel\",\n",
+    "      \"inputSource\": {\n",
+    "        \"type\": \"local\",\n",
+    "        \"baseDir\": \"quickstart/tutorial/\",\n",
+    "        \"filter\": \"wikiticker-2015-09-12-sampled.json.gz\"\n",
+    "      },\n",
+    "      \"inputFormat\": {\n",
+    "        \"type\": \"json\"\n",
+    "      },\n",
+    "      \"appendToExisting\": False\n",
+    "    },\n",
+    "    \"tuningConfig\": {\n",
+    "      \"type\": \"index_parallel\",\n",
+    "      \"maxRowsPerSegment\": 5000000,\n",
+    "      \"maxRowsInMemory\": 25000\n",
+    "    }\n",
+    "  }\n",
+    "})\n",
+    "\n",
+    "headers = {\n",
+    "  'Content-Type': 'application/json'\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1cf78bb7",
+   "metadata": {},
+   "source": [
+    "With the payload and headers ready, run the next cell to send a `POST` 
request to the endpoint."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "543b03ee",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = requests.request(\"POST\", endpoint, headers=headers, 
data=payload)\n",
+    "                            \n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cab33e7e",
+   "metadata": {},
+   "source": [
+    "Once the data has been ingested, Druid will be populated with segments 
for each segment interval that contains data. Since the `wikipedia_hour` was 
ingested with `HOUR` granularity, there will be 24 segments associated with 
`wikipedia_hour`. \n",
+    "\n",
+    "For demonstration, let's view the segments generated for the 
`wikipedia_hour` datasource before any deletion is made. Run the following cell 
to set the endpoint to `/druid/v2/sql/`. For more information on this endpoint, 
see [Druid SQL 
API](https://druid.apache.org/docs/latest/querying/sql-api.html).\n",
+    "\n",
+    "Using this endpoint, you can query the `sys` [metadata 
table](https://druid.apache.org/docs/latest/querying/sql-metadata-tables.html#system-schema)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "956abeee",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + '/druid/v2/sql'\n",
+    "endpoint"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "701550dd",
+   "metadata": {},
+   "source": [
+    "Now, you can query the metadata table to retrieve segment information. 
The following cell sends a SQL query to retrieve `segment_id` information for 
the `wikipedia_hour` datasource. This tutorial sets the `resultFormat` to 
`objectLines`. This helps format the response with newlines and makes it easier 
to parse the output."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bb54a6b7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "payload = json.dumps({\n",
+    "  \"query\": \"SELECT segment_id FROM sys.segments WHERE 
\\\"datasource\\\" = 'wikipedia_hour'\",\n",
+    "  \"resultFormat\": \"objectLines\"\n",
+    "})\n",
+    "headers = {\n",
+    "  'Content-Type': 'application/json'\n",
+    "}\n",
+    " \n",
+    "response = requests.request(\"POST\", endpoint, headers=headers, 
data=payload)\n",
+    "\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f06e24e5",
+   "metadata": {},
+   "source": [
+    "Observe the response retrieved from the previous cell. In total, there 
are 24 `segment_id`, each containing the datasource name `wikipedia_hour`, 
along with the start and end hour interval. The tail end of the ID also 
contains the timestamp of when the request was made. \n",
+    "\n",
+    "For this tutorial, we are concerned with observing the start and end 
interval for each `segment_id`. \n",
+    "\n",
+    "For example: \n",
+    
"`{\"segment_id\":\"wikipedia_hour_2015-09-12T00:00:00.000Z_2015-09-12T01:00:00.000Z_2023-08-07T21:36:29.244Z\"}`
 indicates this segment contains data from `2015-09-12T00:00:00.000` to 
`2015-09-12T01:00:00.000Z`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ca79f5f9",
+   "metadata": {},
+   "source": [
+    "## Deletion steps"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6cd1c8c",
+   "metadata": {},
+   "source": [
+    "Permanent deletion of a segment in Apache Druid has two steps:\n",
+    "\n",
+    "1. A segment is marked as \"unused.\" This step occurs when a segment is 
dropped by a [drop 
rule](https://druid.apache.org/docs/latest/operations/rule-configuration.html#set-retention-rules)
 or manually marked as \"unused\" through the Coordinator API or web console. 
Note that marking a segment as \"unused\" is a soft delete, it is no longer 
available for querying but the segment files remain in deep storage and segment 
records remain in the metadata store. \n",
+    "2. A kill task is sent to permanently remove \"unused\" segments. This 
deletes the segment file from deep storage and removes its record from the 
metadata store. This is a hard delete: the data is unrecoverable unless you 
have a backup."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9bc7f00",
+   "metadata": {},
+   "source": [
+    "## Delete by time interval"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1040bdaf",
+   "metadata": {},
+   "source": [
+    "Segments can be deleted in a specified time interval. This begins with 
marking all segments in the interval as \"unused\", then sending a kill request 
to delete it permanently from deep storage.\n",
+    "\n",
+    "First, set the endpoint variable to the Coordinator API endpoint 
`/druid/coordinator/v1/datasources/:dataSource/markUnused`. Since the 
datasource ingested is `wikipedia_hour`, let's specify that in the endpoint."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9db8786d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + 
'/druid/coordinator/v1/datasources/wikipedia_hour/markUnused'\n",
+    "endpoint"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "863576a9",
+   "metadata": {},
+   "source": [
+    "The following cell constructs a JSON payload with the interval of 
segments to be deleted. This will mark the intervals from `18:00:00.000` to 
`20:00:00.000` non-inclusive as \"unused.\" This payload is sent to the 
endpoint in a `POST` request."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "79387e72",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "payload = json.dumps({\n",
+    "  \"interval\": \"2015-09-12T18:00:00.000Z/2015-09-12T20:00:00.000Z\"\n",
+    "})\n",
+    "headers = {\n",
+    "  'Content-Type': 'application/json'\n",
+    "}\n",
+    "\n",
+    "response = requests.request(\"POST\", endpoint, headers=headers, 
data=payload)\n",
+    "\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89e2fcb4",
+   "metadata": {},
+   "source": [
+    "The response from the above cell should return a JSON object with the 
property `\"numChangedSegments\"` and the value `2`. This refers to the 
following segments:\n",
+    "\n",
+    "* 
`{\"segment_id\":\"wikipedia_hour_2015-09-12T18:00:00.000Z_2015-09-12T19:00:00.000Z_2023-08-07T21:36:29.244Z\"}`\n",
+    "* 
`{\"segment_id\":\"wikipedia_hour_2015-09-12T19:00:00.000Z_2015-09-12T20:00:00.000Z_2023-08-07T21:36:29.244Z\"}`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e61cae23",
+   "metadata": {},
+   "source": [
+    "Next, verify that the segments have been soft deleted. The following cell 
sets the endpoint variable to `/druid/v2/sql` and sends a `POST` request 
querying for the existing `segment_id`s. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ea7c0d26",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + '/druid/v2/sql'\n",
+    "payload = json.dumps({\n",
+    "  \"query\": \"SELECT segment_id FROM sys.segments WHERE 
\\\"datasource\\\" = 'wikipedia_hour'\",\n",
+    "  \"resultFormat\": \"objectLines\"\n",
+    "})\n",
+    "headers = {\n",
+    "  'Content-Type': 'application/json'\n",
+    "}\n",
+    "\n",
+    "response = requests.request(\"POST\", endpoint, headers=headers, 
data=payload)\n",
+    "\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "747bd12c",
+   "metadata": {},
+   "source": [
+    "Observe the response above. There should now be only 22 segments, and the 
\"unused\" segments have been soft deleted. \n",
+    "\n",
+    "However, as you've only soft deleted the segments, it remains in deep 
storage.\n",
+    "\n",
+    "Before permanently deleting the segments, let's observe how this can 
change in deep storage. This step is optional, you can move onto the next set 
of cells without completing this step."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "943b36cc",
+   "metadata": {},
+   "source": [
+    "[OPTIONAL] If you are running Druid externally from the Docker Compose 
environment, follow these instructions to retrieve segments from deep 
storage:\n",

Review Comment:
   We're not really retrieving segments from deep storage here. We're just 
ls'ing the filesystem where the segmetns are stored



##########
examples/quickstart/jupyter-notebooks/notebooks/04-api/01-delete-api-tutorial.ipynb:
##########
@@ -0,0 +1,938 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "71bdcc40",
+   "metadata": {},
+   "source": [
+    "# Learn to delete data with Druid API\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "In working with data, Druid retains a copies of the existing data 
segments in deep storage and Historical processes. As new data is added into 
Druid, deep storage grows and becomes larger over time unless explicitly 
removed.\n",
+    "\n",
+    "While deep storage is an important part of Druid's elastic, 
fault-tolerant design, over time, data accumulation in deep storage can lead to 
increased storage costs. Periodically deleting data can reclaim storage space 
and promote optimal resource allocation.\n",
+    "\n",
+    "This notebook provides a tutorial on deleting existing data in Druid 
using the Coordinator API endpoints. \n",
+    "\n",
+    "## Table of contents\n",
+    "\n",
+    "- [Prerequisites](#Prerequisites)\n",
+    "- [Ingest data](#Ingest-data)\n",
+    "- [Deletion steps](#Deletion-steps)\n",
+    "- [Delete by time interval](#Delete-by-time-interval)\n",
+    "- [Delete entire table](#Delete-entire-table)\n",
+    "- [Delete by segment ID](#Delete-by-segment-ID)\n",
+    "\n",
+    "For the best experience, use JupyterLab so that you can always access the 
table of contents."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6fc260fc",
+   "metadata": {},
+   "source": [
+    "\n",
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid 26.0.0 or later.\n",
+    "\n",
+    "\n",
+    "Launch this tutorial and all prerequisites using the `druid-jupyter`, 
`kafka-jupyter`, or `all-services` profiles of the Docker Compose file for 
Jupyter-based Druid tutorials. For more information, see [Docker for Jupyter 
Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "\n",
+    "If you do not use the Docker Compose environment, you need the 
following:\n",
+    "\n",
+    "* A running Druid instance.<br>\n",
+    "     Update the `druid_host` variable to point to your Router endpoint. 
For example:\n",
+    "     ```\n",
+    "     druid_host = \"http://localhost:8888\"\n";,
+    "     ```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b8a7510",
+   "metadata": {},
+   "source": [
+    "To start this tutorial, run the next cell. It imports the Python packages 
you'll need and defines a variable for the the Druid host, where the Router 
service listens.\n",
+    "\n",
+    "`druid_host` is the hostname and port for your Druid deployment. In a 
distributed environment, you can point to other Druid services. In this 
tutorial, you'll use the Router service as the `druid_host`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ed52d809",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In the Docker Compose tutorial environment, this is the Router\n",
+    "# service running at \"http://router:8888\".\n";,
+    "# If you are not using the Docker Compose environment, edit the 
`druid_host`.\n",
+    "\n",
+    "druid_host = \"http://host.docker.internal:8888\"\n";,
+    "druid_host"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f3c9a92",
+   "metadata": {},
+   "source": [
+    "Before we proceed with the tutorial, let's use the `/status/health` 
endpoint to verify that the cluster if up and running. This endpoint returns 
the Python value `true` if the Druid cluster has finished starting up and is 
running. Do not move on from this point if the following call does not return 
`true`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "18a8a495",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + '/status/health'\n",
+    "response = requests.request(\"GET\", endpoint)\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "19144be9",
+   "metadata": {},
+   "source": [
+    "In the rest of this tutorial, the `endpoint` and other variables are 
updated in code cells to call a different Druid endpoint to accomplish a task."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a281144",
+   "metadata": {},
+   "source": [
+    "## Ingest data\n",
+    "\n",
+    "Apache Druid stores data partitioned by time chunks into segments and 
supports deleting data by dropping segments. Before dropping data, we will use 
the quickstart Wikipedia data ingested with an indexing spec that creates 
hourly segments.\n",
+    "\n",
+    "The following cell sets `endpoint` to `/druid/indexer/v1/task`. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "051655c9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + '/druid/indexer/v1/task'\n",
+    "endpoint"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02e4f551",
+   "metadata": {},
+   "source": [
+    "Next, construct a JSON payload with the ingestion specs to create a 
`wikipedia_hour` datasource with hour segmentation. There are many different 
[methods](https://druid.apache.org/docs/latest/ingestion/index.html#ingestion-methods)
 to ingest data, this tutorial uses [native batch 
ingestion](https://druid.apache.org/docs/latest/ingestion/native-batch.html) 
and the `/druid/indexer/v1/task` endpoint. For more information on construction 
an ingestion spec, see [ingestion spec 
reference](https://druid.apache.org/docs/latest/ingestion/ingestion-spec.html)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9ff9d098",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "payload = json.dumps({\n",
+    "  \"type\": \"index_parallel\",\n",
+    "  \"spec\": {\n",
+    "    \"dataSchema\": {\n",
+    "      \"dataSource\": \"wikipedia_hour\",\n",
+    "      \"timestampSpec\": {\n",
+    "        \"column\": \"time\",\n",
+    "        \"format\": \"iso\"\n",
+    "      },\n",
+    "      \"dimensionsSpec\": {\n",
+    "        \"useSchemaDiscovery\": True\n",
+    "      },\n",
+    "      \"metricsSpec\": [],\n",
+    "      \"granularitySpec\": {\n",
+    "        \"type\": \"uniform\",\n",
+    "        \"segmentGranularity\": \"hour\",\n",
+    "        \"queryGranularity\": \"none\",\n",
+    "        \"intervals\": [\n",
+    "          \"2015-09-12/2015-09-13\"\n",
+    "        ],\n",
+    "        \"rollup\": False\n",
+    "      }\n",
+    "    },\n",
+    "    \"ioConfig\": {\n",
+    "      \"type\": \"index_parallel\",\n",
+    "      \"inputSource\": {\n",
+    "        \"type\": \"local\",\n",
+    "        \"baseDir\": \"quickstart/tutorial/\",\n",
+    "        \"filter\": \"wikiticker-2015-09-12-sampled.json.gz\"\n",
+    "      },\n",
+    "      \"inputFormat\": {\n",
+    "        \"type\": \"json\"\n",
+    "      },\n",
+    "      \"appendToExisting\": False\n",
+    "    },\n",
+    "    \"tuningConfig\": {\n",
+    "      \"type\": \"index_parallel\",\n",
+    "      \"maxRowsPerSegment\": 5000000,\n",
+    "      \"maxRowsInMemory\": 25000\n",
+    "    }\n",
+    "  }\n",
+    "})\n",
+    "\n",
+    "headers = {\n",
+    "  'Content-Type': 'application/json'\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1cf78bb7",
+   "metadata": {},
+   "source": [
+    "With the payload and headers ready, run the next cell to send a `POST` 
request to the endpoint."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "543b03ee",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = requests.request(\"POST\", endpoint, headers=headers, 
data=payload)\n",
+    "                            \n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cab33e7e",
+   "metadata": {},
+   "source": [
+    "Once the data has been ingested, Druid will be populated with segments 
for each segment interval that contains data. Since the `wikipedia_hour` was 
ingested with `HOUR` granularity, there will be 24 segments associated with 
`wikipedia_hour`. \n",
+    "\n",
+    "For demonstration, let's view the segments generated for the 
`wikipedia_hour` datasource before any deletion is made. Run the following cell 
to set the endpoint to `/druid/v2/sql/`. For more information on this endpoint, 
see [Druid SQL 
API](https://druid.apache.org/docs/latest/querying/sql-api.html).\n",
+    "\n",
+    "Using this endpoint, you can query the `sys` [metadata 
table](https://druid.apache.org/docs/latest/querying/sql-metadata-tables.html#system-schema)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "956abeee",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + '/druid/v2/sql'\n",
+    "endpoint"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "701550dd",
+   "metadata": {},
+   "source": [
+    "Now, you can query the metadata table to retrieve segment information. 
The following cell sends a SQL query to retrieve `segment_id` information for 
the `wikipedia_hour` datasource. This tutorial sets the `resultFormat` to 
`objectLines`. This helps format the response with newlines and makes it easier 
to parse the output."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bb54a6b7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "payload = json.dumps({\n",
+    "  \"query\": \"SELECT segment_id FROM sys.segments WHERE 
\\\"datasource\\\" = 'wikipedia_hour'\",\n",
+    "  \"resultFormat\": \"objectLines\"\n",
+    "})\n",
+    "headers = {\n",
+    "  'Content-Type': 'application/json'\n",
+    "}\n",
+    " \n",
+    "response = requests.request(\"POST\", endpoint, headers=headers, 
data=payload)\n",
+    "\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f06e24e5",
+   "metadata": {},
+   "source": [
+    "Observe the response retrieved from the previous cell. In total, there 
are 24 `segment_id`, each containing the datasource name `wikipedia_hour`, 
along with the start and end hour interval. The tail end of the ID also 
contains the timestamp of when the request was made. \n",
+    "\n",
+    "For this tutorial, we are concerned with observing the start and end 
interval for each `segment_id`. \n",
+    "\n",
+    "For example: \n",
+    
"`{\"segment_id\":\"wikipedia_hour_2015-09-12T00:00:00.000Z_2015-09-12T01:00:00.000Z_2023-08-07T21:36:29.244Z\"}`
 indicates this segment contains data from `2015-09-12T00:00:00.000` to 
`2015-09-12T01:00:00.000Z`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ca79f5f9",
+   "metadata": {},
+   "source": [
+    "## Deletion steps"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6cd1c8c",
+   "metadata": {},
+   "source": [
+    "Permanent deletion of a segment in Apache Druid has two steps:\n",
+    "\n",
+    "1. A segment is marked as \"unused.\" This step occurs when a segment is 
dropped by a [drop 
rule](https://druid.apache.org/docs/latest/operations/rule-configuration.html#set-retention-rules)
 or manually marked as \"unused\" through the Coordinator API or web console. 
Note that marking a segment as \"unused\" is a soft delete, it is no longer 
available for querying but the segment files remain in deep storage and segment 
records remain in the metadata store. \n",
+    "2. A kill task is sent to permanently remove \"unused\" segments. This 
deletes the segment file from deep storage and removes its record from the 
metadata store. This is a hard delete: the data is unrecoverable unless you 
have a backup."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9bc7f00",
+   "metadata": {},
+   "source": [
+    "## Delete by time interval"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1040bdaf",
+   "metadata": {},
+   "source": [
+    "Segments can be deleted in a specified time interval. This begins with 
marking all segments in the interval as \"unused\", then sending a kill request 
to delete it permanently from deep storage.\n",
+    "\n",
+    "First, set the endpoint variable to the Coordinator API endpoint 
`/druid/coordinator/v1/datasources/:dataSource/markUnused`. Since the 
datasource ingested is `wikipedia_hour`, let's specify that in the endpoint."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9db8786d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + 
'/druid/coordinator/v1/datasources/wikipedia_hour/markUnused'\n",
+    "endpoint"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "863576a9",
+   "metadata": {},
+   "source": [
+    "The following cell constructs a JSON payload with the interval of 
segments to be deleted. This will mark the intervals from `18:00:00.000` to 
`20:00:00.000` non-inclusive as \"unused.\" This payload is sent to the 
endpoint in a `POST` request."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "79387e72",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "payload = json.dumps({\n",
+    "  \"interval\": \"2015-09-12T18:00:00.000Z/2015-09-12T20:00:00.000Z\"\n",
+    "})\n",
+    "headers = {\n",
+    "  'Content-Type': 'application/json'\n",
+    "}\n",
+    "\n",
+    "response = requests.request(\"POST\", endpoint, headers=headers, 
data=payload)\n",
+    "\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89e2fcb4",
+   "metadata": {},
+   "source": [
+    "The response from the above cell should return a JSON object with the 
property `\"numChangedSegments\"` and the value `2`. This refers to the 
following segments:\n",
+    "\n",
+    "* 
`{\"segment_id\":\"wikipedia_hour_2015-09-12T18:00:00.000Z_2015-09-12T19:00:00.000Z_2023-08-07T21:36:29.244Z\"}`\n",
+    "* 
`{\"segment_id\":\"wikipedia_hour_2015-09-12T19:00:00.000Z_2015-09-12T20:00:00.000Z_2023-08-07T21:36:29.244Z\"}`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e61cae23",
+   "metadata": {},
+   "source": [
+    "Next, verify that the segments have been soft deleted. The following cell 
sets the endpoint variable to `/druid/v2/sql` and sends a `POST` request 
querying for the existing `segment_id`s. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ea7c0d26",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = druid_host + '/druid/v2/sql'\n",
+    "payload = json.dumps({\n",
+    "  \"query\": \"SELECT segment_id FROM sys.segments WHERE 
\\\"datasource\\\" = 'wikipedia_hour'\",\n",
+    "  \"resultFormat\": \"objectLines\"\n",
+    "})\n",
+    "headers = {\n",
+    "  'Content-Type': 'application/json'\n",
+    "}\n",
+    "\n",
+    "response = requests.request(\"POST\", endpoint, headers=headers, 
data=payload)\n",
+    "\n",
+    "print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "747bd12c",
+   "metadata": {},
+   "source": [
+    "Observe the response above. There should now be only 22 segments, and the 
\"unused\" segments have been soft deleted. \n",
+    "\n",
+    "However, as you've only soft deleted the segments, it remains in deep 
storage.\n",
+    "\n",
+    "Before permanently deleting the segments, let's observe how this can 
change in deep storage. This step is optional, you can move onto the next set 
of cells without completing this step."

Review Comment:
   ```suggestion
       "Before permanently deleting the segments, you can verify that they've 
only been soft deleted by inspecting your deep storage. The soft deleted 
segments are still there. This step is optional, you can move onto the next set 
of cells without completing this step."
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Jupyter notebook tutorial - Delete API (druid)

Reply via email to