Re: [PR] 202307-notebook-unionall (druid)

via GitHub Tue, 15 Aug 2023 16:07:31 -0700


techdocsmith commented on code in PR #14726:
URL: https://github.com/apache/druid/pull/14726#discussion_r1295219171



##########
examples/quickstart/jupyter-notebooks/notebooks/03-query/04-UnionOperations.ipynb:
##########
@@ -0,0 +1,541 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "557e06e8-9b35-4b34-8322-8a8ede6de709",
+   "metadata": {},
+   "source": [
+    "# Using `UNION ALL` to address multiple `TABLE`s in the same query\n",
+    "\n",
+    "While working with Druid, you may need to bring together two different 
tables of results together into a single result list, or to treat multiple 
tables as a single input to a query. This notebook introduces the `UNION ALL` 
operator, walking through two ways in which this operator can be used to 
achieve this result: top-level and table-level `UNION ALL`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf4554ae-6516-4e76-b202-d6e2fdf31603",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid 26.0.0 or later.\n",
+    "\n",
+    "#### Run using Docker\n",
+    "\n",
+    "Launch this tutorial and all prerequisites using the `druid-jupyter` 
profile of the Docker Compose file for Jupyter-based Druid tutorials. For more 
information, see [Docker for Jupyter Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "   \n",
+    "#### Run Druid without Docker\n",
+    "\n",
+    "If you do not use the Docker Compose environment, you need the 
following:\n",
+    "\n",
+    "* A running Druid instance, with a `DRUID_HOST` local environment 
variable containing the servername of your Druid router\n",
+    "* 
[druidapi](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/druidapi/README.md),
 a Python client for Apache Druid. Follow the instructions in the Install 
section of the README file."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee0c3171-def8-4ad9-9c56-d3a67f309631",
+   "metadata": {},
+   "source": [
+    "### Initialization\n",
+    "\n",
+    "Run the next cell to attempt a connection to Druid services. If 
successful, the Druid version number will be shown in the output."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9fa4abfe-f878-4031-88f2-94c13e922279",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import druidapi\n",
+    "import os\n",
+    "\n",
+    "if 'DRUID_HOST' not in os.environ.keys():\n",
+    "    druid_host=f\"http://localhost:8888\"\n";,
+    "else:\n",
+    "    druid_host=f\"http://{os.environ['DRUID_HOST']}:8888\"\n",
+    "    \n",
+    "print(f\"Opening a connection to {druid_host}.\")\n",
+    "druid = druidapi.jupyter_client(druid_host)\n",
+    "\n",
+    "display = druid.display\n",
+    "sql_client = druid.sql\n",
+    "status_client = druid.status\n",
+    "\n",
+    "status_client.version"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc3001a0-27e5-4f41-876a-ce6eab2acd6a",
+   "metadata": {},
+   "source": [
+    "Finally, run the following cell to import additional Python modules that 
you will use to X, Y, Z."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6b058d8b-2bae-4929-ab0c-5a6df1850387",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f388633f-195b-4381-98cc-7a2f80f48690",
+   "metadata": {},
+   "source": [
+    "## Using Top-level `UNION ALL` to concatenate result sets\n",
+    "\n",
+    "Run the following cell to ingest the wikipedia data example. Once 
completed, you will see a description of the new table.\n",
+    "\n",
+    "Monitor the ingestion in the Druid console while it runs."

Review Comment:
   ```suggestion
       "You can optionally monitor the ingestion in the Druid console while it 
runs."
   ```
   nit



##########
examples/quickstart/jupyter-notebooks/notebooks/03-query/04-UnionOperations.ipynb:
##########
@@ -0,0 +1,541 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "557e06e8-9b35-4b34-8322-8a8ede6de709",
+   "metadata": {},
+   "source": [
+    "# Using `UNION ALL` to address multiple `TABLE`s in the same query\n",
+    "\n",
+    "While working with Druid, you may need to bring together two different 
tables of results together into a single result list, or to treat multiple 
tables as a single input to a query. This notebook introduces the `UNION ALL` 
operator, walking through two ways in which this operator can be used to 
achieve this result: top-level and table-level `UNION ALL`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf4554ae-6516-4e76-b202-d6e2fdf31603",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid 26.0.0 or later.\n",
+    "\n",
+    "#### Run using Docker\n",
+    "\n",
+    "Launch this tutorial and all prerequisites using the `druid-jupyter` 
profile of the Docker Compose file for Jupyter-based Druid tutorials. For more 
information, see [Docker for Jupyter Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "   \n",
+    "#### Run Druid without Docker\n",
+    "\n",
+    "If you do not use the Docker Compose environment, you need the 
following:\n",
+    "\n",
+    "* A running Druid instance, with a `DRUID_HOST` local environment 
variable containing the servername of your Druid router\n",
+    "* 
[druidapi](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/druidapi/README.md),
 a Python client for Apache Druid. Follow the instructions in the Install 
section of the README file."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee0c3171-def8-4ad9-9c56-d3a67f309631",
+   "metadata": {},
+   "source": [
+    "### Initialization\n",
+    "\n",
+    "Run the next cell to attempt a connection to Druid services. If 
successful, the Druid version number will be shown in the output."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9fa4abfe-f878-4031-88f2-94c13e922279",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import druidapi\n",
+    "import os\n",
+    "\n",
+    "if 'DRUID_HOST' not in os.environ.keys():\n",
+    "    druid_host=f\"http://localhost:8888\"\n";,
+    "else:\n",
+    "    druid_host=f\"http://{os.environ['DRUID_HOST']}:8888\"\n",
+    "    \n",
+    "print(f\"Opening a connection to {druid_host}.\")\n",
+    "druid = druidapi.jupyter_client(druid_host)\n",
+    "\n",
+    "display = druid.display\n",
+    "sql_client = druid.sql\n",
+    "status_client = druid.status\n",
+    "\n",
+    "status_client.version"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc3001a0-27e5-4f41-876a-ce6eab2acd6a",
+   "metadata": {},
+   "source": [
+    "Finally, run the following cell to import additional Python modules that 
you will use to X, Y, Z."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6b058d8b-2bae-4929-ab0c-5a6df1850387",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f388633f-195b-4381-98cc-7a2f80f48690",
+   "metadata": {},
+   "source": [
+    "## Using Top-level `UNION ALL` to concatenate result sets\n",
+    "\n",
+    "Run the following cell to ingest the wikipedia data example. Once 
completed, you will see a description of the new table.\n",
+    "\n",
+    "Monitor the ingestion in the Druid console while it runs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a399196b-12db-42ff-ae24-c7232f150aba",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql='''\n",
+    "REPLACE INTO \"example-wikipedia-unionall\" OVERWRITE ALL\n",
+    "WITH \"ext\" AS (SELECT *\n",
+    "FROM TABLE(\n",
+    "  EXTERN(\n",
+    "    
'{\"type\":\"http\",\"uris\":[\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n",
+    "    '{\"type\":\"json\"}'\n",
+    "  )\n",
+    ") EXTEND (\"isRobot\" VARCHAR, \"channel\" VARCHAR, \"timestamp\" 
VARCHAR, \"flags\" VARCHAR, \"isUnpatrolled\" VARCHAR, \"page\" VARCHAR, 
\"diffUrl\" VARCHAR, \"added\" BIGINT, \"comment\" VARCHAR, \"commentLength\" 
BIGINT, \"isNew\" VARCHAR, \"isMinor\" VARCHAR, \"delta\" BIGINT, 
\"isAnonymous\" VARCHAR, \"user\" VARCHAR, \"deltaBucket\" BIGINT, \"deleted\" 
BIGINT, \"namespace\" VARCHAR, \"cityName\" VARCHAR, \"countryName\" VARCHAR, 
\"regionIsoCode\" VARCHAR, \"metroCode\" BIGINT, \"countryIsoCode\" VARCHAR, 
\"regionName\" VARCHAR))\n",
+    "SELECT\n",
+    "  TIME_PARSE(\"timestamp\") AS \"__time\",\n",
+    "  \"isRobot\",\n",
+    "  \"channel\",\n",
+    "  \"flags\",\n",
+    "  \"isUnpatrolled\",\n",
+    "  \"page\",\n",
+    "  \"diffUrl\",\n",
+    "  \"added\",\n",
+    "  \"comment\",\n",
+    "  \"commentLength\",\n",
+    "  \"isNew\",\n",
+    "  \"isMinor\",\n",
+    "  \"delta\",\n",
+    "  \"isAnonymous\",\n",
+    "  \"user\",\n",
+    "  \"deltaBucket\",\n",
+    "  \"deleted\",\n",
+    "  \"namespace\",\n",
+    "  \"cityName\",\n",
+    "  \"countryName\",\n",
+    "  \"regionIsoCode\",\n",
+    "  \"metroCode\",\n",
+    "  \"countryIsoCode\",\n",
+    "  \"regionName\"\n",
+    "FROM \"ext\"\n",
+    "PARTITIONED BY DAY\n",
+    "'''\n",
+    "\n",
+    "sql_client.run_task(sql)\n",
+    "sql_client.wait_until_ready('example-wikipedia-unionall')\n",
+    "display.table('example-wikipedia-unionall')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "24b47cc3-68f5-4a73-b374-94bbfa32d91d",
+   "metadata": {},
+   "source": [
+    "With `UNION ALL`, we can append the results of one query with another.\n",
+    "\n",
+    "The first query in the cell below, `set1`, returns the ten first edits to 
any \"fr\"-like `channel` between midday and 1pm on the 27th June 2016. The 
second query repeats this but for any \"en\"-like `channel`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b76e5184-9fe4-4f21-a471-4e15d16515c8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "WITH\n",
+    "set1 AS (\n",
+    "  SELECT\n",
+    "    __time,\n",
+    "    \"channel\",\n",
+    "    \"page\",\n",
+    "    \"isRobot\"\n",
+    "  FROM \"example-wikipedia-unionall\"\n",
+    "  WHERE DATE_TRUNC('HOUR', __time) = TIMESTAMP '2016-06-27 12:00:00'\n",
+    "    AND channel LIKE '#fr%'\n",
+    "  ORDER BY __time\n",
+    "  LIMIT 10\n",
+    "  ),\n",
+    "set2 AS (\n",
+    "  SELECT\n",
+    "    __time,\n",
+    "    \"channel\",\n",
+    "    \"page\",\n",
+    "    \"isRobot\"\n",
+    "  FROM \"example-wikipedia-unionall\"\n",
+    "  WHERE DATE_TRUNC('HOUR', __time) = TIMESTAMP '2016-06-27 12:00:00'\n",
+    "    AND channel LIKE '#en%'\n",
+    "  ORDER BY __time\n",
+    "  LIMIT 10\n",
+    "  )\n",
+    "  \n",
+    "SELECT * from set1\n",
+    "UNION ALL\n",
+    "SELECT * from set2\n",
+    "'''\n",
+    "\n",
+    "display.sql(sql)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5e77fa9-a60c-4955-b763-58d970d7326d",
+   "metadata": {},
+   "source": [
+    "This is what's known as a 
[top-level](https://druid.apache.org/docs/latest/querying/sql.html#top-level) 
`UNION` operation. First, `set1` was calculated, and the results of subsequent 
sets were then appended.\n",

Review Comment:
   ```suggestion
       "This is a 
[top-level](https://druid.apache.org/docs/latest/querying/sql.html#top-level) 
`UNION` operation. First, Druid calculated `set1` and appended subsequent 
results sets.\n",
   ```
   nit. extra words. passive



##########
examples/quickstart/jupyter-notebooks/notebooks/03-query/04-UnionOperations.ipynb:
##########
@@ -0,0 +1,541 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "557e06e8-9b35-4b34-8322-8a8ede6de709",
+   "metadata": {},
+   "source": [
+    "# Using `UNION ALL` to address multiple `TABLE`s in the same query\n",
+    "\n",
+    "While working with Druid, you may need to bring together two different 
tables of results together into a single result list, or to treat multiple 
tables as a single input to a query. This notebook introduces the `UNION ALL` 
operator, walking through two ways in which this operator can be used to 
achieve this result: top-level and table-level `UNION ALL`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf4554ae-6516-4e76-b202-d6e2fdf31603",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid 26.0.0 or later.\n",
+    "\n",
+    "#### Run using Docker\n",
+    "\n",
+    "Launch this tutorial and all prerequisites using the `druid-jupyter` 
profile of the Docker Compose file for Jupyter-based Druid tutorials. For more 
information, see [Docker for Jupyter Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "   \n",
+    "#### Run Druid without Docker\n",
+    "\n",
+    "If you do not use the Docker Compose environment, you need the 
following:\n",
+    "\n",
+    "* A running Druid instance, with a `DRUID_HOST` local environment 
variable containing the servername of your Druid router\n",
+    "* 
[druidapi](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/druidapi/README.md),
 a Python client for Apache Druid. Follow the instructions in the Install 
section of the README file."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee0c3171-def8-4ad9-9c56-d3a67f309631",
+   "metadata": {},
+   "source": [
+    "### Initialization\n",
+    "\n",
+    "Run the next cell to attempt a connection to Druid services. If 
successful, the Druid version number will be shown in the output."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9fa4abfe-f878-4031-88f2-94c13e922279",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import druidapi\n",
+    "import os\n",
+    "\n",
+    "if 'DRUID_HOST' not in os.environ.keys():\n",
+    "    druid_host=f\"http://localhost:8888\"\n";,
+    "else:\n",
+    "    druid_host=f\"http://{os.environ['DRUID_HOST']}:8888\"\n",
+    "    \n",
+    "print(f\"Opening a connection to {druid_host}.\")\n",
+    "druid = druidapi.jupyter_client(druid_host)\n",
+    "\n",
+    "display = druid.display\n",
+    "sql_client = druid.sql\n",
+    "status_client = druid.status\n",
+    "\n",
+    "status_client.version"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc3001a0-27e5-4f41-876a-ce6eab2acd6a",
+   "metadata": {},
+   "source": [
+    "Finally, run the following cell to import additional Python modules that 
you will use to X, Y, Z."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6b058d8b-2bae-4929-ab0c-5a6df1850387",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f388633f-195b-4381-98cc-7a2f80f48690",
+   "metadata": {},
+   "source": [
+    "## Using Top-level `UNION ALL` to concatenate result sets\n",
+    "\n",
+    "Run the following cell to ingest the wikipedia data example. Once 
completed, you will see a description of the new table.\n",
+    "\n",
+    "Monitor the ingestion in the Druid console while it runs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a399196b-12db-42ff-ae24-c7232f150aba",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql='''\n",
+    "REPLACE INTO \"example-wikipedia-unionall\" OVERWRITE ALL\n",
+    "WITH \"ext\" AS (SELECT *\n",
+    "FROM TABLE(\n",
+    "  EXTERN(\n",
+    "    
'{\"type\":\"http\",\"uris\":[\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n",
+    "    '{\"type\":\"json\"}'\n",
+    "  )\n",
+    ") EXTEND (\"isRobot\" VARCHAR, \"channel\" VARCHAR, \"timestamp\" 
VARCHAR, \"flags\" VARCHAR, \"isUnpatrolled\" VARCHAR, \"page\" VARCHAR, 
\"diffUrl\" VARCHAR, \"added\" BIGINT, \"comment\" VARCHAR, \"commentLength\" 
BIGINT, \"isNew\" VARCHAR, \"isMinor\" VARCHAR, \"delta\" BIGINT, 
\"isAnonymous\" VARCHAR, \"user\" VARCHAR, \"deltaBucket\" BIGINT, \"deleted\" 
BIGINT, \"namespace\" VARCHAR, \"cityName\" VARCHAR, \"countryName\" VARCHAR, 
\"regionIsoCode\" VARCHAR, \"metroCode\" BIGINT, \"countryIsoCode\" VARCHAR, 
\"regionName\" VARCHAR))\n",
+    "SELECT\n",
+    "  TIME_PARSE(\"timestamp\") AS \"__time\",\n",
+    "  \"isRobot\",\n",
+    "  \"channel\",\n",
+    "  \"flags\",\n",
+    "  \"isUnpatrolled\",\n",
+    "  \"page\",\n",
+    "  \"diffUrl\",\n",
+    "  \"added\",\n",
+    "  \"comment\",\n",
+    "  \"commentLength\",\n",
+    "  \"isNew\",\n",
+    "  \"isMinor\",\n",
+    "  \"delta\",\n",
+    "  \"isAnonymous\",\n",
+    "  \"user\",\n",
+    "  \"deltaBucket\",\n",
+    "  \"deleted\",\n",
+    "  \"namespace\",\n",
+    "  \"cityName\",\n",
+    "  \"countryName\",\n",
+    "  \"regionIsoCode\",\n",
+    "  \"metroCode\",\n",
+    "  \"countryIsoCode\",\n",
+    "  \"regionName\"\n",
+    "FROM \"ext\"\n",
+    "PARTITIONED BY DAY\n",
+    "'''\n",
+    "\n",
+    "sql_client.run_task(sql)\n",
+    "sql_client.wait_until_ready('example-wikipedia-unionall')\n",
+    "display.table('example-wikipedia-unionall')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "24b47cc3-68f5-4a73-b374-94bbfa32d91d",
+   "metadata": {},
+   "source": [
+    "With `UNION ALL`, we can append the results of one query with another.\n",
+    "\n",
+    "The first query in the cell below, `set1`, returns the ten first edits to 
any \"fr\"-like `channel` between midday and 1pm on the 27th June 2016. The 
second query repeats this but for any \"en\"-like `channel`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b76e5184-9fe4-4f21-a471-4e15d16515c8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "WITH\n",
+    "set1 AS (\n",
+    "  SELECT\n",
+    "    __time,\n",
+    "    \"channel\",\n",
+    "    \"page\",\n",
+    "    \"isRobot\"\n",
+    "  FROM \"example-wikipedia-unionall\"\n",
+    "  WHERE DATE_TRUNC('HOUR', __time) = TIMESTAMP '2016-06-27 12:00:00'\n",
+    "    AND channel LIKE '#fr%'\n",
+    "  ORDER BY __time\n",
+    "  LIMIT 10\n",
+    "  ),\n",
+    "set2 AS (\n",
+    "  SELECT\n",
+    "    __time,\n",
+    "    \"channel\",\n",
+    "    \"page\",\n",
+    "    \"isRobot\"\n",
+    "  FROM \"example-wikipedia-unionall\"\n",
+    "  WHERE DATE_TRUNC('HOUR', __time) = TIMESTAMP '2016-06-27 12:00:00'\n",
+    "    AND channel LIKE '#en%'\n",
+    "  ORDER BY __time\n",
+    "  LIMIT 10\n",
+    "  )\n",
+    "  \n",
+    "SELECT * from set1\n",
+    "UNION ALL\n",
+    "SELECT * from set2\n",
+    "'''\n",
+    "\n",
+    "display.sql(sql)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5e77fa9-a60c-4955-b763-58d970d7326d",
+   "metadata": {},
+   "source": [
+    "This is what's known as a 
[top-level](https://druid.apache.org/docs/latest/querying/sql.html#top-level) 
`UNION` operation. First, `set1` was calculated, and the results of subsequent 
sets were then appended.\n",
+    "\n",
+    "Notice that these results are not in order by time – even though the 
individual sets did `ORDER BY` time. Druid simply concatenated the two result 
sets together.\n",
+    "\n",
+    "Optionally, run the next cell – it shows the precise `EXPLAIN PLAN` for 
the query. You can see there are two `query` execution plans, one for each 
query, and that Druid's planning process has taken time to optimize how the 
query above will actually execute."

Review Comment:
   ```suggestion
       "Optionally, run the next cell to show the precise `EXPLAIN PLAN` for 
the query. You can see there are two `query` execution plans, one for each 
subquery. Also, Druid's planning process optimized execution of the outer 
query."
   ```
   nit: wordy
   also: is there a resource to link to to help with reading explain plans? 
some of us might not be expert 😳 



##########
examples/quickstart/jupyter-notebooks/notebooks/03-query/04-UnionOperations.ipynb:
##########
@@ -0,0 +1,541 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "557e06e8-9b35-4b34-8322-8a8ede6de709",
+   "metadata": {},
+   "source": [
+    "# Using `UNION ALL` to address multiple `TABLE`s in the same query\n",
+    "\n",
+    "While working with Druid, you may need to bring together two different 
tables of results together into a single result list, or to treat multiple 
tables as a single input to a query. This notebook introduces the `UNION ALL` 
operator, walking through two ways in which this operator can be used to 
achieve this result: top-level and table-level `UNION ALL`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf4554ae-6516-4e76-b202-d6e2fdf31603",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid 26.0.0 or later.\n",
+    "\n",
+    "#### Run using Docker\n",
+    "\n",
+    "Launch this tutorial and all prerequisites using the `druid-jupyter` 
profile of the Docker Compose file for Jupyter-based Druid tutorials. For more 
information, see [Docker for Jupyter Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "   \n",
+    "#### Run Druid without Docker\n",
+    "\n",
+    "If you do not use the Docker Compose environment, you need the 
following:\n",
+    "\n",
+    "* A running Druid instance, with a `DRUID_HOST` local environment 
variable containing the servername of your Druid router\n",
+    "* 
[druidapi](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/druidapi/README.md),
 a Python client for Apache Druid. Follow the instructions in the Install 
section of the README file."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee0c3171-def8-4ad9-9c56-d3a67f309631",
+   "metadata": {},
+   "source": [
+    "### Initialization\n",
+    "\n",
+    "Run the next cell to attempt a connection to Druid services. If 
successful, the Druid version number will be shown in the output."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9fa4abfe-f878-4031-88f2-94c13e922279",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import druidapi\n",
+    "import os\n",
+    "\n",
+    "if 'DRUID_HOST' not in os.environ.keys():\n",
+    "    druid_host=f\"http://localhost:8888\"\n";,
+    "else:\n",
+    "    druid_host=f\"http://{os.environ['DRUID_HOST']}:8888\"\n",
+    "    \n",
+    "print(f\"Opening a connection to {druid_host}.\")\n",
+    "druid = druidapi.jupyter_client(druid_host)\n",
+    "\n",
+    "display = druid.display\n",
+    "sql_client = druid.sql\n",
+    "status_client = druid.status\n",
+    "\n",
+    "status_client.version"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc3001a0-27e5-4f41-876a-ce6eab2acd6a",
+   "metadata": {},
+   "source": [
+    "Finally, run the following cell to import additional Python modules that 
you will use to X, Y, Z."

Review Comment:
   Address placeholder text.
   ```suggestion
       "Finally, run the following cell to import the Python JSON module."
   ```
   I think JSON is pretty standard so you don't need a big description of how 
you'll use it. Also @sergioferragut suggested placing imports near where 
they're used, but for this PR, it is enough to clean up the placeholder.



##########
examples/quickstart/jupyter-notebooks/notebooks/03-query/04-UnionOperations.ipynb:
##########
@@ -0,0 +1,541 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "557e06e8-9b35-4b34-8322-8a8ede6de709",
+   "metadata": {},
+   "source": [
+    "# Using `UNION ALL` to address multiple `TABLE`s in the same query\n",
+    "\n",
+    "While working with Druid, you may need to bring together two different 
tables of results together into a single result list, or to treat multiple 
tables as a single input to a query. This notebook introduces the `UNION ALL` 
operator, walking through two ways in which this operator can be used to 
achieve this result: top-level and table-level `UNION ALL`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf4554ae-6516-4e76-b202-d6e2fdf31603",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid 26.0.0 or later.\n",
+    "\n",
+    "#### Run using Docker\n",
+    "\n",
+    "Launch this tutorial and all prerequisites using the `druid-jupyter` 
profile of the Docker Compose file for Jupyter-based Druid tutorials. For more 
information, see [Docker for Jupyter Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "   \n",
+    "#### Run Druid without Docker\n",
+    "\n",
+    "If you do not use the Docker Compose environment, you need the 
following:\n",
+    "\n",
+    "* A running Druid instance, with a `DRUID_HOST` local environment 
variable containing the servername of your Druid router\n",
+    "* 
[druidapi](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/druidapi/README.md),
 a Python client for Apache Druid. Follow the instructions in the Install 
section of the README file."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee0c3171-def8-4ad9-9c56-d3a67f309631",
+   "metadata": {},
+   "source": [
+    "### Initialization\n",
+    "\n",
+    "Run the next cell to attempt a connection to Druid services. If 
successful, the Druid version number will be shown in the output."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9fa4abfe-f878-4031-88f2-94c13e922279",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import druidapi\n",
+    "import os\n",
+    "\n",
+    "if 'DRUID_HOST' not in os.environ.keys():\n",
+    "    druid_host=f\"http://localhost:8888\"\n";,
+    "else:\n",
+    "    druid_host=f\"http://{os.environ['DRUID_HOST']}:8888\"\n",
+    "    \n",
+    "print(f\"Opening a connection to {druid_host}.\")\n",
+    "druid = druidapi.jupyter_client(druid_host)\n",
+    "\n",
+    "display = druid.display\n",
+    "sql_client = druid.sql\n",
+    "status_client = druid.status\n",
+    "\n",
+    "status_client.version"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc3001a0-27e5-4f41-876a-ce6eab2acd6a",
+   "metadata": {},
+   "source": [
+    "Finally, run the following cell to import additional Python modules that 
you will use to X, Y, Z."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6b058d8b-2bae-4929-ab0c-5a6df1850387",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f388633f-195b-4381-98cc-7a2f80f48690",
+   "metadata": {},
+   "source": [
+    "## Using Top-level `UNION ALL` to concatenate result sets\n",
+    "\n",
+    "Run the following cell to ingest the wikipedia data example. Once 
completed, you will see a description of the new table.\n",
+    "\n",
+    "Monitor the ingestion in the Druid console while it runs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a399196b-12db-42ff-ae24-c7232f150aba",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql='''\n",
+    "REPLACE INTO \"example-wikipedia-unionall\" OVERWRITE ALL\n",
+    "WITH \"ext\" AS (SELECT *\n",
+    "FROM TABLE(\n",
+    "  EXTERN(\n",
+    "    
'{\"type\":\"http\",\"uris\":[\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n",
+    "    '{\"type\":\"json\"}'\n",
+    "  )\n",
+    ") EXTEND (\"isRobot\" VARCHAR, \"channel\" VARCHAR, \"timestamp\" 
VARCHAR, \"flags\" VARCHAR, \"isUnpatrolled\" VARCHAR, \"page\" VARCHAR, 
\"diffUrl\" VARCHAR, \"added\" BIGINT, \"comment\" VARCHAR, \"commentLength\" 
BIGINT, \"isNew\" VARCHAR, \"isMinor\" VARCHAR, \"delta\" BIGINT, 
\"isAnonymous\" VARCHAR, \"user\" VARCHAR, \"deltaBucket\" BIGINT, \"deleted\" 
BIGINT, \"namespace\" VARCHAR, \"cityName\" VARCHAR, \"countryName\" VARCHAR, 
\"regionIsoCode\" VARCHAR, \"metroCode\" BIGINT, \"countryIsoCode\" VARCHAR, 
\"regionName\" VARCHAR))\n",
+    "SELECT\n",
+    "  TIME_PARSE(\"timestamp\") AS \"__time\",\n",
+    "  \"isRobot\",\n",
+    "  \"channel\",\n",
+    "  \"flags\",\n",
+    "  \"isUnpatrolled\",\n",
+    "  \"page\",\n",
+    "  \"diffUrl\",\n",
+    "  \"added\",\n",
+    "  \"comment\",\n",
+    "  \"commentLength\",\n",
+    "  \"isNew\",\n",
+    "  \"isMinor\",\n",
+    "  \"delta\",\n",
+    "  \"isAnonymous\",\n",
+    "  \"user\",\n",
+    "  \"deltaBucket\",\n",
+    "  \"deleted\",\n",
+    "  \"namespace\",\n",
+    "  \"cityName\",\n",
+    "  \"countryName\",\n",
+    "  \"regionIsoCode\",\n",
+    "  \"metroCode\",\n",
+    "  \"countryIsoCode\",\n",
+    "  \"regionName\"\n",
+    "FROM \"ext\"\n",
+    "PARTITIONED BY DAY\n",
+    "'''\n",
+    "\n",
+    "sql_client.run_task(sql)\n",
+    "sql_client.wait_until_ready('example-wikipedia-unionall')\n",
+    "display.table('example-wikipedia-unionall')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "24b47cc3-68f5-4a73-b374-94bbfa32d91d",
+   "metadata": {},
+   "source": [
+    "With `UNION ALL`, we can append the results of one query with another.\n",
+    "\n",
+    "The first query in the cell below, `set1`, returns the ten first edits to 
any \"fr\"-like `channel` between midday and 1pm on the 27th June 2016. The 
second query repeats this but for any \"en\"-like `channel`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b76e5184-9fe4-4f21-a471-4e15d16515c8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "WITH\n",
+    "set1 AS (\n",
+    "  SELECT\n",
+    "    __time,\n",
+    "    \"channel\",\n",
+    "    \"page\",\n",
+    "    \"isRobot\"\n",
+    "  FROM \"example-wikipedia-unionall\"\n",
+    "  WHERE DATE_TRUNC('HOUR', __time) = TIMESTAMP '2016-06-27 12:00:00'\n",
+    "    AND channel LIKE '#fr%'\n",
+    "  ORDER BY __time\n",
+    "  LIMIT 10\n",
+    "  ),\n",
+    "set2 AS (\n",
+    "  SELECT\n",
+    "    __time,\n",
+    "    \"channel\",\n",
+    "    \"page\",\n",
+    "    \"isRobot\"\n",
+    "  FROM \"example-wikipedia-unionall\"\n",
+    "  WHERE DATE_TRUNC('HOUR', __time) = TIMESTAMP '2016-06-27 12:00:00'\n",
+    "    AND channel LIKE '#en%'\n",
+    "  ORDER BY __time\n",
+    "  LIMIT 10\n",
+    "  )\n",
+    "  \n",
+    "SELECT * from set1\n",
+    "UNION ALL\n",
+    "SELECT * from set2\n",
+    "'''\n",
+    "\n",
+    "display.sql(sql)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5e77fa9-a60c-4955-b763-58d970d7326d",
+   "metadata": {},
+   "source": [
+    "This is what's known as a 
[top-level](https://druid.apache.org/docs/latest/querying/sql.html#top-level) 
`UNION` operation. First, `set1` was calculated, and the results of subsequent 
sets were then appended.\n",
+    "\n",
+    "Notice that these results are not in order by time – even though the 
individual sets did `ORDER BY` time. Druid simply concatenated the two result 
sets together.\n",
+    "\n",
+    "Optionally, run the next cell – it shows the precise `EXPLAIN PLAN` for 
the query. You can see there are two `query` execution plans, one for each 
query, and that Druid's planning process has taken time to optimize how the 
query above will actually execute."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "97934da2-17d1-4c91-8ae3-926cc89185c1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(json.dumps(json.loads(sql_client.explain_sql(sql)['PLAN']), 
indent=2))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "800add1a-d459-4796-b974-b2f094db417f",
+   "metadata": {},
+   "source": [
+    "Run next cell to perform another top-level UNION ALL, this time where the 
sets use `GROUP BY`.\n",
+    "\n",
+    "Notice that the aggregates have `AS` to set specific field names."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8e687466-74bb-4cc0-ba17-913d1807fc60",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql='''\n",
+    "WITH\n",
+    "set1 AS (\n",
+    "  SELECT\n",
+    "    TIME_FLOOR(__time, 'PT1H') AS \"Period\",\n",
+    "    countryName,\n",
+    "    AVG(commentLength) AS \"Average Comment Size\",\n",
+    "    COUNT(DISTINCT \"page\") AS \"Pages\"\n",
+    "  FROM \"example-wikipedia-unionall\"\n",
+    "  WHERE countryName='China'\n",
+    "  GROUP BY 1, 2\n",
+    "  LIMIT 10\n",
+    "  ),\n",
+    "set2 AS (\n",
+    "  SELECT\n",
+    "    TIME_FLOOR(__time, 'PT1H') AS \"Episode\",\n",
+    "    countryName,\n",
+    "    COUNT(DISTINCT \"page\") AS \"Pages\",\n",
+    "    AVG(commentLength) AS \"Average Comment Length\"\n",
+    "  FROM \"example-wikipedia-unionall\"\n",
+    "  WHERE countryName='Austria'\n",
+    "  GROUP BY 1, 2\n",
+    "  LIMIT 10\n",
+    "  )\n",
+    "\n",
+    "SELECT * from set1\n",
+    "UNION ALL\n",
+    "SELECT * from set2\n",
+    "'''\n",
+    "\n",
+    "display.sql(sql)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f2c95ffc-b260-4671-bacc-c8cc3137e9c2",
+   "metadata": {},
+   "source": [
+    "Look carefully at these results - results from `set2` were appended very 
simply.\n",

Review Comment:
   ```suggestion
       "Look carefully at these results - Druid has simply appended the results 
from `set2` to `set1` without introducing redundant columns.\n",
   ```
   nit. passive + "very"
   Great demo of this behavior!



##########
examples/quickstart/jupyter-notebooks/notebooks/03-query/04-UnionOperations.ipynb:
##########
@@ -0,0 +1,541 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "557e06e8-9b35-4b34-8322-8a8ede6de709",
+   "metadata": {},
+   "source": [
+    "# Using `UNION ALL` to address multiple `TABLE`s in the same query\n",
+    "\n",
+    "While working with Druid, you may need to bring together two different 
tables of results together into a single result list, or to treat multiple 
tables as a single input to a query. This notebook introduces the `UNION ALL` 
operator, walking through two ways in which this operator can be used to 
achieve this result: top-level and table-level `UNION ALL`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf4554ae-6516-4e76-b202-d6e2fdf31603",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid 26.0.0 or later.\n",
+    "\n",
+    "#### Run using Docker\n",
+    "\n",
+    "Launch this tutorial and all prerequisites using the `druid-jupyter` 
profile of the Docker Compose file for Jupyter-based Druid tutorials. For more 
information, see [Docker for Jupyter Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "   \n",
+    "#### Run Druid without Docker\n",
+    "\n",
+    "If you do not use the Docker Compose environment, you need the 
following:\n",
+    "\n",
+    "* A running Druid instance, with a `DRUID_HOST` local environment 
variable containing the servername of your Druid router\n",
+    "* 
[druidapi](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/druidapi/README.md),
 a Python client for Apache Druid. Follow the instructions in the Install 
section of the README file."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee0c3171-def8-4ad9-9c56-d3a67f309631",
+   "metadata": {},
+   "source": [
+    "### Initialization\n",
+    "\n",
+    "Run the next cell to attempt a connection to Druid services. If 
successful, the Druid version number will be shown in the output."

Review Comment:
   ```suggestion
       "Run the next cell to attempt a connection to Druid services. If 
successful, the output shows the Druid version number."
   ```
   nit:passive



##########
examples/quickstart/jupyter-notebooks/notebooks/03-query/04-UnionOperations.ipynb:
##########
@@ -0,0 +1,541 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "557e06e8-9b35-4b34-8322-8a8ede6de709",
+   "metadata": {},
+   "source": [
+    "# Using `UNION ALL` to address multiple `TABLE`s in the same query\n",
+    "\n",
+    "While working with Druid, you may need to bring together two different 
tables of results together into a single result list, or to treat multiple 
tables as a single input to a query. This notebook introduces the `UNION ALL` 
operator, walking through two ways in which this operator can be used to 
achieve this result: top-level and table-level `UNION ALL`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf4554ae-6516-4e76-b202-d6e2fdf31603",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid 26.0.0 or later.\n",
+    "\n",
+    "#### Run using Docker\n",
+    "\n",
+    "Launch this tutorial and all prerequisites using the `druid-jupyter` 
profile of the Docker Compose file for Jupyter-based Druid tutorials. For more 
information, see [Docker for Jupyter Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "   \n",
+    "#### Run Druid without Docker\n",
+    "\n",
+    "If you do not use the Docker Compose environment, you need the 
following:\n",
+    "\n",
+    "* A running Druid instance, with a `DRUID_HOST` local environment 
variable containing the servername of your Druid router\n",
+    "* 
[druidapi](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/druidapi/README.md),
 a Python client for Apache Druid. Follow the instructions in the Install 
section of the README file."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee0c3171-def8-4ad9-9c56-d3a67f309631",
+   "metadata": {},
+   "source": [
+    "### Initialization\n",
+    "\n",
+    "Run the next cell to attempt a connection to Druid services. If 
successful, the Druid version number will be shown in the output."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9fa4abfe-f878-4031-88f2-94c13e922279",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import druidapi\n",
+    "import os\n",
+    "\n",
+    "if 'DRUID_HOST' not in os.environ.keys():\n",
+    "    druid_host=f\"http://localhost:8888\"\n";,
+    "else:\n",
+    "    druid_host=f\"http://{os.environ['DRUID_HOST']}:8888\"\n",
+    "    \n",
+    "print(f\"Opening a connection to {druid_host}.\")\n",
+    "druid = druidapi.jupyter_client(druid_host)\n",
+    "\n",
+    "display = druid.display\n",
+    "sql_client = druid.sql\n",
+    "status_client = druid.status\n",
+    "\n",
+    "status_client.version"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc3001a0-27e5-4f41-876a-ce6eab2acd6a",
+   "metadata": {},
+   "source": [
+    "Finally, run the following cell to import additional Python modules that 
you will use to X, Y, Z."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6b058d8b-2bae-4929-ab0c-5a6df1850387",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f388633f-195b-4381-98cc-7a2f80f48690",
+   "metadata": {},
+   "source": [
+    "## Using Top-level `UNION ALL` to concatenate result sets\n",
+    "\n",
+    "Run the following cell to ingest the wikipedia data example. Once 
completed, you will see a description of the new table.\n",
+    "\n",
+    "Monitor the ingestion in the Druid console while it runs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a399196b-12db-42ff-ae24-c7232f150aba",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql='''\n",
+    "REPLACE INTO \"example-wikipedia-unionall\" OVERWRITE ALL\n",
+    "WITH \"ext\" AS (SELECT *\n",
+    "FROM TABLE(\n",
+    "  EXTERN(\n",
+    "    
'{\"type\":\"http\",\"uris\":[\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n",
+    "    '{\"type\":\"json\"}'\n",
+    "  )\n",
+    ") EXTEND (\"isRobot\" VARCHAR, \"channel\" VARCHAR, \"timestamp\" 
VARCHAR, \"flags\" VARCHAR, \"isUnpatrolled\" VARCHAR, \"page\" VARCHAR, 
\"diffUrl\" VARCHAR, \"added\" BIGINT, \"comment\" VARCHAR, \"commentLength\" 
BIGINT, \"isNew\" VARCHAR, \"isMinor\" VARCHAR, \"delta\" BIGINT, 
\"isAnonymous\" VARCHAR, \"user\" VARCHAR, \"deltaBucket\" BIGINT, \"deleted\" 
BIGINT, \"namespace\" VARCHAR, \"cityName\" VARCHAR, \"countryName\" VARCHAR, 
\"regionIsoCode\" VARCHAR, \"metroCode\" BIGINT, \"countryIsoCode\" VARCHAR, 
\"regionName\" VARCHAR))\n",
+    "SELECT\n",
+    "  TIME_PARSE(\"timestamp\") AS \"__time\",\n",
+    "  \"isRobot\",\n",
+    "  \"channel\",\n",
+    "  \"flags\",\n",
+    "  \"isUnpatrolled\",\n",
+    "  \"page\",\n",
+    "  \"diffUrl\",\n",
+    "  \"added\",\n",
+    "  \"comment\",\n",
+    "  \"commentLength\",\n",
+    "  \"isNew\",\n",
+    "  \"isMinor\",\n",
+    "  \"delta\",\n",
+    "  \"isAnonymous\",\n",
+    "  \"user\",\n",
+    "  \"deltaBucket\",\n",
+    "  \"deleted\",\n",
+    "  \"namespace\",\n",
+    "  \"cityName\",\n",
+    "  \"countryName\",\n",
+    "  \"regionIsoCode\",\n",
+    "  \"metroCode\",\n",
+    "  \"countryIsoCode\",\n",
+    "  \"regionName\"\n",
+    "FROM \"ext\"\n",
+    "PARTITIONED BY DAY\n",
+    "'''\n",
+    "\n",
+    "sql_client.run_task(sql)\n",
+    "sql_client.wait_until_ready('example-wikipedia-unionall')\n",
+    "display.table('example-wikipedia-unionall')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "24b47cc3-68f5-4a73-b374-94bbfa32d91d",
+   "metadata": {},
+   "source": [
+    "With `UNION ALL`, we can append the results of one query with another.\n",
+    "\n",
+    "The first query in the cell below, `set1`, returns the ten first edits to 
any \"fr\"-like `channel` between midday and 1pm on the 27th June 2016. The 
second query repeats this but for any \"en\"-like `channel`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b76e5184-9fe4-4f21-a471-4e15d16515c8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "WITH\n",
+    "set1 AS (\n",
+    "  SELECT\n",
+    "    __time,\n",
+    "    \"channel\",\n",
+    "    \"page\",\n",
+    "    \"isRobot\"\n",
+    "  FROM \"example-wikipedia-unionall\"\n",
+    "  WHERE DATE_TRUNC('HOUR', __time) = TIMESTAMP '2016-06-27 12:00:00'\n",
+    "    AND channel LIKE '#fr%'\n",
+    "  ORDER BY __time\n",
+    "  LIMIT 10\n",
+    "  ),\n",
+    "set2 AS (\n",
+    "  SELECT\n",
+    "    __time,\n",
+    "    \"channel\",\n",
+    "    \"page\",\n",
+    "    \"isRobot\"\n",
+    "  FROM \"example-wikipedia-unionall\"\n",
+    "  WHERE DATE_TRUNC('HOUR', __time) = TIMESTAMP '2016-06-27 12:00:00'\n",
+    "    AND channel LIKE '#en%'\n",
+    "  ORDER BY __time\n",
+    "  LIMIT 10\n",
+    "  )\n",
+    "  \n",
+    "SELECT * from set1\n",
+    "UNION ALL\n",
+    "SELECT * from set2\n",
+    "'''\n",
+    "\n",
+    "display.sql(sql)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5e77fa9-a60c-4955-b763-58d970d7326d",
+   "metadata": {},
+   "source": [
+    "This is what's known as a 
[top-level](https://druid.apache.org/docs/latest/querying/sql.html#top-level) 
`UNION` operation. First, `set1` was calculated, and the results of subsequent 
sets were then appended.\n",
+    "\n",
+    "Notice that these results are not in order by time – even though the 
individual sets did `ORDER BY` time. Druid simply concatenated the two result 
sets together.\n",
+    "\n",
+    "Optionally, run the next cell – it shows the precise `EXPLAIN PLAN` for 
the query. You can see there are two `query` execution plans, one for each 
query, and that Druid's planning process has taken time to optimize how the 
query above will actually execute."

Review Comment:
   ```suggestion
       "Optionally, run the next cell to show the precise `EXPLAIN PLAN` for 
the query. You can see there are two `query` execution plans, one for each 
subquery. Also, Druid's planning process optimized execution of the outer 
query."
   ```
   nit: wordy
   also: is there a resource to link to to help with reading explain plans? 
some of us might not be expert 😳 



##########
examples/quickstart/jupyter-notebooks/notebooks/03-query/04-UnionOperations.ipynb:
##########
@@ -0,0 +1,541 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "557e06e8-9b35-4b34-8322-8a8ede6de709",
+   "metadata": {},
+   "source": [
+    "# Using `UNION ALL` to address multiple `TABLE`s in the same query\n",
+    "\n",
+    "While working with Druid, you may need to bring together two different 
tables of results together into a single result list, or to treat multiple 
tables as a single input to a query. This notebook introduces the `UNION ALL` 
operator, walking through two ways in which this operator can be used to 
achieve this result: top-level and table-level `UNION ALL`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf4554ae-6516-4e76-b202-d6e2fdf31603",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid 26.0.0 or later.\n",
+    "\n",
+    "#### Run using Docker\n",
+    "\n",
+    "Launch this tutorial and all prerequisites using the `druid-jupyter` 
profile of the Docker Compose file for Jupyter-based Druid tutorials. For more 
information, see [Docker for Jupyter Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "   \n",
+    "#### Run Druid without Docker\n",
+    "\n",
+    "If you do not use the Docker Compose environment, you need the 
following:\n",
+    "\n",
+    "* A running Druid instance, with a `DRUID_HOST` local environment 
variable containing the servername of your Druid router\n",
+    "* 
[druidapi](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/druidapi/README.md),
 a Python client for Apache Druid. Follow the instructions in the Install 
section of the README file."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee0c3171-def8-4ad9-9c56-d3a67f309631",
+   "metadata": {},
+   "source": [
+    "### Initialization\n",
+    "\n",
+    "Run the next cell to attempt a connection to Druid services. If 
successful, the Druid version number will be shown in the output."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9fa4abfe-f878-4031-88f2-94c13e922279",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import druidapi\n",
+    "import os\n",
+    "\n",
+    "if 'DRUID_HOST' not in os.environ.keys():\n",
+    "    druid_host=f\"http://localhost:8888\"\n";,
+    "else:\n",
+    "    druid_host=f\"http://{os.environ['DRUID_HOST']}:8888\"\n",
+    "    \n",
+    "print(f\"Opening a connection to {druid_host}.\")\n",
+    "druid = druidapi.jupyter_client(druid_host)\n",
+    "\n",
+    "display = druid.display\n",
+    "sql_client = druid.sql\n",
+    "status_client = druid.status\n",
+    "\n",
+    "status_client.version"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc3001a0-27e5-4f41-876a-ce6eab2acd6a",
+   "metadata": {},
+   "source": [
+    "Finally, run the following cell to import additional Python modules that 
you will use to X, Y, Z."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6b058d8b-2bae-4929-ab0c-5a6df1850387",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f388633f-195b-4381-98cc-7a2f80f48690",
+   "metadata": {},
+   "source": [
+    "## Using Top-level `UNION ALL` to concatenate result sets\n",
+    "\n",
+    "Run the following cell to ingest the wikipedia data example. Once 
completed, you will see a description of the new table.\n",
+    "\n",
+    "Monitor the ingestion in the Druid console while it runs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a399196b-12db-42ff-ae24-c7232f150aba",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql='''\n",
+    "REPLACE INTO \"example-wikipedia-unionall\" OVERWRITE ALL\n",
+    "WITH \"ext\" AS (SELECT *\n",
+    "FROM TABLE(\n",
+    "  EXTERN(\n",
+    "    
'{\"type\":\"http\",\"uris\":[\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n",
+    "    '{\"type\":\"json\"}'\n",
+    "  )\n",
+    ") EXTEND (\"isRobot\" VARCHAR, \"channel\" VARCHAR, \"timestamp\" 
VARCHAR, \"flags\" VARCHAR, \"isUnpatrolled\" VARCHAR, \"page\" VARCHAR, 
\"diffUrl\" VARCHAR, \"added\" BIGINT, \"comment\" VARCHAR, \"commentLength\" 
BIGINT, \"isNew\" VARCHAR, \"isMinor\" VARCHAR, \"delta\" BIGINT, 
\"isAnonymous\" VARCHAR, \"user\" VARCHAR, \"deltaBucket\" BIGINT, \"deleted\" 
BIGINT, \"namespace\" VARCHAR, \"cityName\" VARCHAR, \"countryName\" VARCHAR, 
\"regionIsoCode\" VARCHAR, \"metroCode\" BIGINT, \"countryIsoCode\" VARCHAR, 
\"regionName\" VARCHAR))\n",
+    "SELECT\n",
+    "  TIME_PARSE(\"timestamp\") AS \"__time\",\n",
+    "  \"isRobot\",\n",
+    "  \"channel\",\n",
+    "  \"flags\",\n",
+    "  \"isUnpatrolled\",\n",
+    "  \"page\",\n",
+    "  \"diffUrl\",\n",
+    "  \"added\",\n",
+    "  \"comment\",\n",
+    "  \"commentLength\",\n",
+    "  \"isNew\",\n",
+    "  \"isMinor\",\n",
+    "  \"delta\",\n",
+    "  \"isAnonymous\",\n",
+    "  \"user\",\n",
+    "  \"deltaBucket\",\n",
+    "  \"deleted\",\n",
+    "  \"namespace\",\n",
+    "  \"cityName\",\n",
+    "  \"countryName\",\n",
+    "  \"regionIsoCode\",\n",
+    "  \"metroCode\",\n",
+    "  \"countryIsoCode\",\n",
+    "  \"regionName\"\n",
+    "FROM \"ext\"\n",
+    "PARTITIONED BY DAY\n",
+    "'''\n",
+    "\n",
+    "sql_client.run_task(sql)\n",
+    "sql_client.wait_until_ready('example-wikipedia-unionall')\n",
+    "display.table('example-wikipedia-unionall')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "24b47cc3-68f5-4a73-b374-94bbfa32d91d",
+   "metadata": {},
+   "source": [
+    "With `UNION ALL`, we can append the results of one query with another.\n",

Review Comment:
   ```suggestion
       "You can use `UNION ALL` to append the results of one query with 
another.\n",
   ```
   nit: favor 2nd persion "you" over 1st person "we"



##########
examples/quickstart/jupyter-notebooks/notebooks/03-query/04-UnionOperations.ipynb:
##########
@@ -0,0 +1,541 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "557e06e8-9b35-4b34-8322-8a8ede6de709",
+   "metadata": {},
+   "source": [
+    "# Using `UNION ALL` to address multiple `TABLE`s in the same query\n",
+    "\n",
+    "While working with Druid, you may need to bring together two different 
tables of results together into a single result list, or to treat multiple 
tables as a single input to a query. This notebook introduces the `UNION ALL` 
operator, walking through two ways in which this operator can be used to 
achieve this result: top-level and table-level `UNION ALL`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf4554ae-6516-4e76-b202-d6e2fdf31603",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid 26.0.0 or later.\n",
+    "\n",
+    "#### Run using Docker\n",
+    "\n",
+    "Launch this tutorial and all prerequisites using the `druid-jupyter` 
profile of the Docker Compose file for Jupyter-based Druid tutorials. For more 
information, see [Docker for Jupyter Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "   \n",
+    "#### Run Druid without Docker\n",
+    "\n",
+    "If you do not use the Docker Compose environment, you need the 
following:\n",
+    "\n",
+    "* A running Druid instance, with a `DRUID_HOST` local environment 
variable containing the servername of your Druid router\n",
+    "* 
[druidapi](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/druidapi/README.md),
 a Python client for Apache Druid. Follow the instructions in the Install 
section of the README file."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee0c3171-def8-4ad9-9c56-d3a67f309631",
+   "metadata": {},
+   "source": [
+    "### Initialization\n",
+    "\n",
+    "Run the next cell to attempt a connection to Druid services. If 
successful, the Druid version number will be shown in the output."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9fa4abfe-f878-4031-88f2-94c13e922279",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import druidapi\n",
+    "import os\n",
+    "\n",
+    "if 'DRUID_HOST' not in os.environ.keys():\n",
+    "    druid_host=f\"http://localhost:8888\"\n";,
+    "else:\n",
+    "    druid_host=f\"http://{os.environ['DRUID_HOST']}:8888\"\n",
+    "    \n",
+    "print(f\"Opening a connection to {druid_host}.\")\n",
+    "druid = druidapi.jupyter_client(druid_host)\n",
+    "\n",
+    "display = druid.display\n",
+    "sql_client = druid.sql\n",
+    "status_client = druid.status\n",
+    "\n",
+    "status_client.version"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc3001a0-27e5-4f41-876a-ce6eab2acd6a",
+   "metadata": {},
+   "source": [
+    "Finally, run the following cell to import additional Python modules that 
you will use to X, Y, Z."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6b058d8b-2bae-4929-ab0c-5a6df1850387",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f388633f-195b-4381-98cc-7a2f80f48690",
+   "metadata": {},
+   "source": [
+    "## Using Top-level `UNION ALL` to concatenate result sets\n",
+    "\n",
+    "Run the following cell to ingest the wikipedia data example. Once 
completed, you will see a description of the new table.\n",
+    "\n",
+    "Monitor the ingestion in the Druid console while it runs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a399196b-12db-42ff-ae24-c7232f150aba",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql='''\n",
+    "REPLACE INTO \"example-wikipedia-unionall\" OVERWRITE ALL\n",
+    "WITH \"ext\" AS (SELECT *\n",
+    "FROM TABLE(\n",
+    "  EXTERN(\n",
+    "    
'{\"type\":\"http\",\"uris\":[\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n",
+    "    '{\"type\":\"json\"}'\n",
+    "  )\n",
+    ") EXTEND (\"isRobot\" VARCHAR, \"channel\" VARCHAR, \"timestamp\" 
VARCHAR, \"flags\" VARCHAR, \"isUnpatrolled\" VARCHAR, \"page\" VARCHAR, 
\"diffUrl\" VARCHAR, \"added\" BIGINT, \"comment\" VARCHAR, \"commentLength\" 
BIGINT, \"isNew\" VARCHAR, \"isMinor\" VARCHAR, \"delta\" BIGINT, 
\"isAnonymous\" VARCHAR, \"user\" VARCHAR, \"deltaBucket\" BIGINT, \"deleted\" 
BIGINT, \"namespace\" VARCHAR, \"cityName\" VARCHAR, \"countryName\" VARCHAR, 
\"regionIsoCode\" VARCHAR, \"metroCode\" BIGINT, \"countryIsoCode\" VARCHAR, 
\"regionName\" VARCHAR))\n",
+    "SELECT\n",
+    "  TIME_PARSE(\"timestamp\") AS \"__time\",\n",
+    "  \"isRobot\",\n",
+    "  \"channel\",\n",
+    "  \"flags\",\n",
+    "  \"isUnpatrolled\",\n",
+    "  \"page\",\n",
+    "  \"diffUrl\",\n",
+    "  \"added\",\n",
+    "  \"comment\",\n",
+    "  \"commentLength\",\n",
+    "  \"isNew\",\n",
+    "  \"isMinor\",\n",
+    "  \"delta\",\n",
+    "  \"isAnonymous\",\n",
+    "  \"user\",\n",
+    "  \"deltaBucket\",\n",
+    "  \"deleted\",\n",
+    "  \"namespace\",\n",
+    "  \"cityName\",\n",
+    "  \"countryName\",\n",
+    "  \"regionIsoCode\",\n",
+    "  \"metroCode\",\n",
+    "  \"countryIsoCode\",\n",
+    "  \"regionName\"\n",
+    "FROM \"ext\"\n",
+    "PARTITIONED BY DAY\n",
+    "'''\n",
+    "\n",
+    "sql_client.run_task(sql)\n",
+    "sql_client.wait_until_ready('example-wikipedia-unionall')\n",
+    "display.table('example-wikipedia-unionall')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "24b47cc3-68f5-4a73-b374-94bbfa32d91d",
+   "metadata": {},
+   "source": [
+    "With `UNION ALL`, we can append the results of one query with another.\n",
+    "\n",
+    "The first query in the cell below, `set1`, returns the ten first edits to 
any \"fr\"-like `channel` between midday and 1pm on the 27th June 2016. The 
second query repeats this but for any \"en\"-like `channel`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b76e5184-9fe4-4f21-a471-4e15d16515c8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "WITH\n",
+    "set1 AS (\n",
+    "  SELECT\n",
+    "    __time,\n",
+    "    \"channel\",\n",
+    "    \"page\",\n",
+    "    \"isRobot\"\n",
+    "  FROM \"example-wikipedia-unionall\"\n",
+    "  WHERE DATE_TRUNC('HOUR', __time) = TIMESTAMP '2016-06-27 12:00:00'\n",
+    "    AND channel LIKE '#fr%'\n",
+    "  ORDER BY __time\n",
+    "  LIMIT 10\n",
+    "  ),\n",
+    "set2 AS (\n",
+    "  SELECT\n",
+    "    __time,\n",
+    "    \"channel\",\n",
+    "    \"page\",\n",
+    "    \"isRobot\"\n",
+    "  FROM \"example-wikipedia-unionall\"\n",
+    "  WHERE DATE_TRUNC('HOUR', __time) = TIMESTAMP '2016-06-27 12:00:00'\n",
+    "    AND channel LIKE '#en%'\n",
+    "  ORDER BY __time\n",
+    "  LIMIT 10\n",
+    "  )\n",
+    "  \n",
+    "SELECT * from set1\n",
+    "UNION ALL\n",
+    "SELECT * from set2\n",
+    "'''\n",
+    "\n",
+    "display.sql(sql)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5e77fa9-a60c-4955-b763-58d970d7326d",
+   "metadata": {},
+   "source": [
+    "This is what's known as a 
[top-level](https://druid.apache.org/docs/latest/querying/sql.html#top-level) 
`UNION` operation. First, `set1` was calculated, and the results of subsequent 
sets were then appended.\n",
+    "\n",
+    "Notice that these results are not in order by time – even though the 
individual sets did `ORDER BY` time. Druid simply concatenated the two result 
sets together.\n",
+    "\n",
+    "Optionally, run the next cell – it shows the precise `EXPLAIN PLAN` for 
the query. You can see there are two `query` execution plans, one for each 
query, and that Druid's planning process has taken time to optimize how the 
query above will actually execute."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "97934da2-17d1-4c91-8ae3-926cc89185c1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(json.dumps(json.loads(sql_client.explain_sql(sql)['PLAN']), 
indent=2))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "800add1a-d459-4796-b974-b2f094db417f",
+   "metadata": {},
+   "source": [
+    "Run next cell to perform another top-level UNION ALL, this time where the 
sets use `GROUP BY`.\n",
+    "\n",
+    "Notice that the aggregates have `AS` to set specific field names."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8e687466-74bb-4cc0-ba17-913d1807fc60",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql='''\n",
+    "WITH\n",
+    "set1 AS (\n",
+    "  SELECT\n",
+    "    TIME_FLOOR(__time, 'PT1H') AS \"Period\",\n",
+    "    countryName,\n",
+    "    AVG(commentLength) AS \"Average Comment Size\",\n",
+    "    COUNT(DISTINCT \"page\") AS \"Pages\"\n",
+    "  FROM \"example-wikipedia-unionall\"\n",
+    "  WHERE countryName='China'\n",
+    "  GROUP BY 1, 2\n",
+    "  LIMIT 10\n",
+    "  ),\n",
+    "set2 AS (\n",
+    "  SELECT\n",
+    "    TIME_FLOOR(__time, 'PT1H') AS \"Episode\",\n",
+    "    countryName,\n",
+    "    COUNT(DISTINCT \"page\") AS \"Pages\",\n",
+    "    AVG(commentLength) AS \"Average Comment Length\"\n",
+    "  FROM \"example-wikipedia-unionall\"\n",
+    "  WHERE countryName='Austria'\n",
+    "  GROUP BY 1, 2\n",
+    "  LIMIT 10\n",
+    "  )\n",
+    "\n",
+    "SELECT * from set1\n",
+    "UNION ALL\n",
+    "SELECT * from set2\n",
+    "'''\n",
+    "\n",
+    "display.sql(sql)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f2c95ffc-b260-4671-bacc-c8cc3137e9c2",
+   "metadata": {},
+   "source": [
+    "Look carefully at these results - results from `set2` were appended very 
simply.\n",
+    "\n",
+    "* Column name in `set2` (`Period` versus `Episode` and `Average Comment 
Size` versus `Average Comment Length`) did not result in new columns\n",
+    "* Columns with the same name (`Pages`) did not result in that aggregate 
being put into same column - Austria's values are simply appended `Average 
Comment Size`\n",
+    "\n",
+    "Run the next cell, which uses explicit column names at the top-level, 
rather than `*`, to ensure the calculations appear in the right columns in the 
final result. It also explicitly names the column names for the results by 
using `AS`."

Review Comment:
   ```suggestion
       "Run the next cell, which uses explicit column names at the top-level, 
rather than `*`, to ensure the calculations appear in the right columns in the 
final result. It also aliases the columns for the results by using `AS`."
   ```
   I think this is called "aliasing", but be sure



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] 202307-notebook-unionall (druid)

Reply via email to