[GitHub] [druid] techdocsmith commented on a diff in pull request #13465: docs: notebook only for SQL tutorial

GitBox Wed, 04 Jan 2023 13:15:31 -0800


techdocsmith commented on code in PR #13465:
URL: https://github.com/apache/druid/pull/13465#discussion_r1061860734



##########
examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb:
##########
@@ -0,0 +1,765 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ad4e60b6",
+   "metadata": {
+    "deletable": true,
+    "editable": true,
+    "tags": []
+   },
+   "source": [
+    "# Tutorial: Learn the basics of Druid SQL\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "  \n",
+    "Apache Druid supports two query languages: Druid SQL and native 
queries.\n",
+    "Druid SQL is a Structured Query Language (SQL) dialect that enables you 
to query datasources in Apache Druid using SQL statements.\n",
+    "SQL and Druid SQL use similar syntax, with some notable differences.\n",
+    "Not all SQL functions are supported in Druid SQL. Instead, Druid includes 
Druid-specific SQL functions for optimized query performance.\n",
+    "\n",
+    "This interactive tutorial introduces you to the unique aspects of Druid 
SQL.\n",
+    "To learn about native queries, see [Native 
queries](https://druid.apache.org/docs/latest/querying/querying.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d6bbbcb",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Make sure that you meet the requirements outlined in the README.md file 
of the [apache/druid 
repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).\n",
+    "Specifically, you need the following:\n",
+    "- Knowledge of SQL\n",
+    "- [Python3](https://www.python.org/downloads/)\n",
+    "- [The `requests` package for 
Python](https://requests.readthedocs.io/en/latest/user/install/)\n",
+    "- [JupyterLab](https://jupyter.org/install#jupyterlab) (recommended) or 
[Jupyter Notebook](https://jupyter.org/install#jupyter-notebook) running on a 
non-default port. Druid and Jupyter both default to port `8888`, so you need to 
start Jupyter on a different port. \n",
+    "- An available Druid instance. This tutorial uses the `micro-quickstart` 
configuration described in the [Druid 
quickstart](https://druid.apache.org/docs/latest/tutorials/index.html), so no 
authentication or authorization is required unless explicitly mentioned. If you 
haven’t already, download Druid version 24.0 or higher and start Druid services 
as described in the quickstart."

Review Comment:
   The startup script has changed for Druid 25.0 and I think `micro-quickstart` 
is going away. We should align this with the new method. If this is in other 
tutorials/notebooks, we'll need to correct there.
   
   cc: @317brian 



##########
examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb:
##########
@@ -0,0 +1,765 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ad4e60b6",
+   "metadata": {
+    "deletable": true,
+    "editable": true,
+    "tags": []
+   },
+   "source": [
+    "# Tutorial: Learn the basics of Druid SQL\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "  \n",
+    "Apache Druid supports two query languages: Druid SQL and native 
queries.\n",
+    "Druid SQL is a Structured Query Language (SQL) dialect that enables you 
to query datasources in Apache Druid using SQL statements.\n",
+    "SQL and Druid SQL use similar syntax, with some notable differences.\n",
+    "Not all SQL functions are supported in Druid SQL. Instead, Druid includes 
Druid-specific SQL functions for optimized query performance.\n",
+    "\n",
+    "This interactive tutorial introduces you to the unique aspects of Druid 
SQL.\n",
+    "To learn about native queries, see [Native 
queries](https://druid.apache.org/docs/latest/querying/querying.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d6bbbcb",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Make sure that you meet the requirements outlined in the README.md file 
of the [apache/druid 
repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).\n",
+    "Specifically, you need the following:\n",
+    "- Knowledge of SQL\n",
+    "- [Python3](https://www.python.org/downloads/)\n",
+    "- [The `requests` package for 
Python](https://requests.readthedocs.io/en/latest/user/install/)\n",
+    "- [JupyterLab](https://jupyter.org/install#jupyterlab) (recommended) or 
[Jupyter Notebook](https://jupyter.org/install#jupyter-notebook) running on a 
non-default port. Druid and Jupyter both default to port `8888`, so you need to 
start Jupyter on a different port. \n",
+    "- An available Druid instance. This tutorial uses the `micro-quickstart` 
configuration described in the [Druid 
quickstart](https://druid.apache.org/docs/latest/tutorials/index.html), so no 
authentication or authorization is required unless explicitly mentioned. If you 
haven’t already, download Druid version 24.0 or higher and start Druid services 
as described in the quickstart."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f8e64f0-c29a-473c-8783-a2ff8648acd7",
+   "metadata": {},
+   "source": [
+    "## Prepare your environment\n",

Review Comment:
   This is not exactly "preparing the environment". We are just loading the 
required packages into the current kernel. I think the way @317brian handled it 
-- keeping it inside prerequisites and using the instruction:
   
   > To start this tutorial, run the next cell. It imports the Python packages 
you'll need and defines a variable for the the Druid host, where the Router 
service listens.



##########
examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb:
##########
@@ -0,0 +1,765 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ad4e60b6",
+   "metadata": {
+    "deletable": true,
+    "editable": true,
+    "tags": []
+   },
+   "source": [
+    "# Tutorial: Learn the basics of Druid SQL\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "  \n",
+    "Apache Druid supports two query languages: Druid SQL and native 
queries.\n",
+    "Druid SQL is a Structured Query Language (SQL) dialect that enables you 
to query datasources in Apache Druid using SQL statements.\n",
+    "SQL and Druid SQL use similar syntax, with some notable differences.\n",
+    "Not all SQL functions are supported in Druid SQL. Instead, Druid includes 
Druid-specific SQL functions for optimized query performance.\n",
+    "\n",
+    "This interactive tutorial introduces you to the unique aspects of Druid 
SQL.\n",
+    "To learn about native queries, see [Native 
queries](https://druid.apache.org/docs/latest/querying/querying.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d6bbbcb",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Make sure that you meet the requirements outlined in the README.md file 
of the [apache/druid 
repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).\n",
+    "Specifically, you need the following:\n",
+    "- Knowledge of SQL\n",
+    "- [Python3](https://www.python.org/downloads/)\n",
+    "- [The `requests` package for 
Python](https://requests.readthedocs.io/en/latest/user/install/)\n",
+    "- [JupyterLab](https://jupyter.org/install#jupyterlab) (recommended) or 
[Jupyter Notebook](https://jupyter.org/install#jupyter-notebook) running on a 
non-default port. Druid and Jupyter both default to port `8888`, so you need to 
start Jupyter on a different port. \n",
+    "- An available Druid instance. This tutorial uses the `micro-quickstart` 
configuration described in the [Druid 
quickstart](https://druid.apache.org/docs/latest/tutorials/index.html), so no 
authentication or authorization is required unless explicitly mentioned. If you 
haven’t already, download Druid version 24.0 or higher and start Druid services 
as described in the quickstart."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f8e64f0-c29a-473c-8783-a2ff8648acd7",
+   "metadata": {},
+   "source": [
+    "## Prepare your environment\n",
+    "\n",
+    "This section contains the steps required to prepare your environment to 
follow along with this tutorial.\n",
+    "\n",
+    "Start by running the following cell. It imports the required Python 
packages and defines a variable for the Druid host."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b7f08a52",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In a distributed environment, use the Router service  as the 
`druid_host`. \n",
+    "druid_host = \"http://localhost:8888\"\n";,
+    "dataSourceName = \"wikipedia-sql-tutorial\"\n",
+    "print(f\"\\033[1mDruid host\\033[0m: {druid_host}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e893ef7d-7136-442f-8bd9-31b5a5276518",
+   "metadata": {},
+   "source": [
+    "In the rest of the tutorial, the `endpoint`, `http_method`, and `payload` 
variables are updated to accomplish different tasks.\n",
+    "\n",
+    "Run the following cell to ingest data from an external source into a 
table named `wikipedia-sql-tutorial` using the [multi-stage query (MSQ) task 
engine](https://druid.apache.org/docs/latest/multi-stage-query/index.html)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "045f782c-74d8-4447-9487-529071812b51",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql/task\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "\"query\": \"INSERT INTO \\\"wikipedia-sql-tutorial\\\" SELECT 
TIME_PARSE(\\\"timestamp\\\") \\\n",
+    "          AS __time, * FROM TABLE \\\n",
+    "          (EXTERN('{\\\"type\\\": \\\"http\\\", \\\"uris\\\": 
[\\\"https://druid.apache.org/data/wikipedia.json.gz\\\"]}', '{\\\"type\\\": 
\\\"json\\\"}', '[{\\\"name\\\": \\\"added\\\", \\\"type\\\": \\\"long\\\"}, 
{\\\"name\\\": \\\"channel\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"cityName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"comment\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"commentLength\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": 
\\\"countryIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"countryName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"deleted\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"delta\\\", 
\\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"deltaBucket\\\", \\\"type\\\": 
\\\"string\\\"}, {\\\"name\\\": \\\"diffUrl\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"flags\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"isAnonymous\\\", \\\"type\\\": \\\"string\\\"}, {\\\
 "name\\\": \\\"isMinor\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"isNew\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isRobot\\\", 
\\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isUnpatrolled\\\", 
\\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"metroCode\\\", \\\"type\\\": 
\\\"string\\\"}, {\\\"name\\\": \\\"namespace\\\", \\\"type\\\": 
\\\"string\\\"}, {\\\"name\\\": \\\"page\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"regionIsoCode\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"regionName\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"timestamp\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"user\\\", \\\"type\\\": \\\"string\\\"}]')) \\\n",
+    "          PARTITIONED BY DAY\",\n",
+    "  \"context\": {\n",
+    "    \"maxNumTasks\": 3\n",
+    "  }\n",
+    "})\n",
+    "\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "ingestion_taskId_response = response\n",
+    "ingestion_taskId = 
json.loads(ingestion_taskId_response.text)['taskId']\n",
+    "print(f\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(f\"\\nInserting data into the table named {dataSourceName}\")\n",
+    "print(\"\\nThe response includes the task ID and the status: \" + 
response.text + \".\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ceb86ce0-85f6-4c63-8fd6-883033ee96e9",
+   "metadata": {},
+   "source": [
+    "Wait for ingestion to complete before proceeding.\n",
+    "To check on the status of your ingestion task, run the following cell. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "df12d12c-a067-4759-bae0-0410c24b6205",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import time\n",
+    "\n",
+    "endpoint = f\"/druid/indexer/v1/task/{ingestion_taskId}/status\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"GET\"\n",
+    "\n",
+    "payload = {}\n",
+    "headers = {}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "ingestion_status = json.loads(response.text)['status']['status']\n",
+    "# If you only want to fetch the status once and print it, \n",
+    "# uncomment the print statement and comment out the if and while loops\n",
+    "# print(json.dumps(response.json(), indent=4))\n",
+    "\n",
+    "if ingestion_status == \"RUNNING\":\n",
+    "  print(\"The ingestion is running...\")\n",
+    "\n",
+    "while ingestion_status != \"SUCCESS\":\n",
+    "  response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "  ingestion_status = json.loads(response.text)['status']['status']\n",
+    "  time.sleep(15)  \n",
+    "  \n",
+    "if ingestion_status == \"SUCCESS\": \n",
+    "  print(\"The ingestion is complete:\")\n",
+    "  print(json.dumps(response.json(), indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "240b0ad5-48f2-4737-b12b-5fd5f98da300",
+   "metadata": {},
+   "source": [
+    "## Datasources\n",
+    "\n",
+    "Druid supports a variety of datasources, with the table datasource being 
the most common. In Druid documentation, the word \"datasource\" often 
implicitly refers to the table datasource.\n",
+    "The 
[Datasources](https://druid.apache.org/docs/latest/querying/datasource.html) 
topic provides a comprehensive overview of datasources supported by Druid 
SQL.\n",
+    "\n",
+    "In Druid SQL, table datasources reside in the `druid` schema. This is the 
default schema, so table datasources can be referenced as either 
`druid.dataSourceName` or `dataSourceName`.\n",
+    "\n",
+    "For example, run the next cell to return the rows of the column named 
`channel` from the `wikipedia-sql-tutorial` table. Because this tutorial is 
running in Jupyter, make sure to limit the size of your query results for 
display purposes using the SQL LIMIT clause."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "91dd255a-4d55-493e-a067-4cef5c659657",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT \\\"channel\\\" FROM \\\"wikipedia-sql-tutorial\\\" 
LIMIT 7\"})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(f\"\\nEach JSON object in the response represents a row in the 
{dataSourceName} datasource.\") \n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cbeb5a63",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Data types\n",
+    "\n",
+    "Druid maps SQL data types onto native types at query runtime.\n",
+    "The following native types are supported for Druid columns:\n",
+    "\n",
+    "* STRING: UTF-8 encoded strings and string arrays\n",
+    "* LONG: 64-bit signed int\n",
+    "* FLOAT: 32-bit float\n",
+    "* DOUBLE: 64-bit float\n",
+    "* COMPLEX: represents non-standard data types, such as nested JSON, 
hyperUnique and approxHistogram aggregators, and DataSketches aggregators\n",
+    "\n",
+    "Druid exposes table and column metadata through 
[INFORMATION_SCHEMA](https://druid.apache.org/docs/latest/querying/sql-metadata-tables.html#information-schema)
 tables. Run the following query to retrieve metadata for the 
`wikipedia-sql-tutorial` datasource. In the response body, each JSON object 
correlates to a column in the table.\n",
+    "Check the objects' `DATA_TYPE` property for SQL data types. You should 
see TIMESTAMP, BIGINT, and VARCHAR SQL data types. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b9227d6c-1d8c-4169-b13b-a08625c4011f",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE 
\\\"TABLE_SCHEMA\\\" = 'druid' AND \\\"TABLE_NAME\\\" = 
'wikipedia-sql-tutorial' LIMIT 7\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c59ca797-dd91-442b-8d02-67b711b3fcc6",
+   "metadata": {},
+   "source": [
+    "Druid natively interprets VARCHAR as STRING and BIGINT and TIMESTAMP SQL 
data types as LONG. For reference on how SQL data types map onto Druid native 
types, see [Standard 
types](https://druid.apache.org/docs/latest/querying/sql-data-types.html#standard-types).\n",
+    "\n",
+    "### Timestamp values\n",
+    "\n",
+    "Druid stores timestamp values as the number of milliseconds since the 
Unix epoch.\n",
+    "Primary timestamps are stored in a column named `__time`.\n",
+    "If a dataset doesn't have a timestamp, Druid uses the default value of 
`1970-01-01 00:00:00`.\n",
+    "\n",
+    "Druid time functions perform best when used with the `__time` column.\n",
+    "By default, time functions use the UTC time zone.\n",
+    "For more information about timestamp handling, see [Date and time 
functions](https://druid.apache.org/docs/latest/querying/sql-scalar.html#date-and-time-functions).\n",
+    "\n",
+    "Run the following cell to see Druid SQL `TIME_IN_INTERVAL` scalar 
function at work. This query checks whether a timestamp is contained within a 
specified interval. The results are grouped by date."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f7e3d62a-1325-4992-8bcd-c0f1925704bc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT TIME_IN_INTERVAL(__time, 
'2016-06-27T12:00:00/2016-06-27T13:00:00'), COUNT(*) FROM 
\\\"wikipedia-sql-tutorial\\\" \\\n",
+    "  GROUP BY TIME_IN_INTERVAL(__time, 
'2016-06-27T12:00:00/2016-06-27T13:00:00')\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7cfdfae-ccba-49ba-a70f-63d0bd3527b2",
+   "metadata": {},
+   "source": [
+    "### NULL values\n",
+    "\n",
+    "Druid supports SQL compatible NULL handling, allowing string columns to 
distinguish empty strings from NULL and numeric columns to contain NULL rows. 
To store and query data in SQL compatible mode, explicitly set the 
`useDefaultValueForNull` property to `false` in 
`_common/common.runtime.properties`. See [Configuration 
reference](https://druid.apache.org/docs/latest/configuration/index.html) for 
common configuration properties.\n",
+    "\n",
+    "When `useDefaultValueForNull` is set to `true` (default behavior), Druid 
stores NULL values as `0` for numeric columns and as `''` for string columns."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29c24856",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## SELECT statement syntax\n",
+    "\n",
+    "Druid SQL supports SELECT statements with the following structure:\n",
+    "\n",
+    "``` mysql\n",
+    "[ EXPLAIN PLAN FOR ]\n",
+    "[ WITH tableName [ ( column1, column2, ... ) ] AS ( query ) ]\n",
+    "SELECT [ ALL | DISTINCT ] { * | exprs }\n",
+    "FROM { <table> | (<subquery>) | <o1> [ INNER | LEFT ] JOIN <o2> ON 
condition }\n",
+    "[ WHERE expr ]\n",
+    "[ GROUP BY [ exprs | GROUPING SETS ( (exprs), ... ) | ROLLUP (exprs) | 
CUBE (exprs) ] ]\n",
+    "[ HAVING expr ]\n",
+    "[ ORDER BY expr [ ASC | DESC ], expr [ ASC | DESC ], ... ]\n",
+    "[ LIMIT limit ]\n",
+    "[ OFFSET offset ]\n",
+    "[ UNION ALL <another query> ]\n",
+    "```\n",
+    "\n",
+    "As a general rule, use the LIMIT clause with `SELECT *` to limit the 
number of rows returned. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8cf212e5-fb3f-4206-acdd-46ef1da327ab",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "## WHERE clause\n",
+    "\n",
+    "Druid SQL uses the [SQL WHERE 
clause](https://druid.apache.org/docs/latest/querying/sql.html#where) of a 
SELECT statement to fetch data based on a particular condition.\n",
+    "\n",
+    "In most cases, specifying a time range when using the WHERE clause 
improves query performance.\n",
+    "This is because Druid partitions data by time and having a time range 
allows Druid to skip over unrelated data.\n",
+    "\n",
+    "Druid supports range filtering on columns that contain long millisecond 
values, with the boundaries specified as ISO 8601 time intervals. This is 
suitable for the `__time` column, long metric columns, and dimensions with 
values that can be parsed as long milliseconds.\n",
+    "    \n",
+    "For example, the following cell uses a comparison operator on the 
`__time` field to filter results from a certain time range. The cell limits the 
results to seven rows for display purposes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ffca2962-da31-4b9c-adbc-882e35386916",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT * FROM \\\"wikipedia-sql-tutorial\\\" WHERE __time 
>= TIMESTAMP '2015-09-12 23:33:55' LIMIT 7\"\n",

Review Comment:
   I'd like to avoid `SELECT *` if possible. If you use fewer columns the 
output is easier to read. Also selecting fewer columns is more performant with 
Druid since it only needs to fetch the columns you specify. Consider paring 
these down.



##########
examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb:
##########
@@ -0,0 +1,765 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ad4e60b6",
+   "metadata": {
+    "deletable": true,
+    "editable": true,
+    "tags": []
+   },
+   "source": [
+    "# Tutorial: Learn the basics of Druid SQL\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "  \n",
+    "Apache Druid supports two query languages: Druid SQL and native 
queries.\n",
+    "Druid SQL is a Structured Query Language (SQL) dialect that enables you 
to query datasources in Apache Druid using SQL statements.\n",
+    "SQL and Druid SQL use similar syntax, with some notable differences.\n",
+    "Not all SQL functions are supported in Druid SQL. Instead, Druid includes 
Druid-specific SQL functions for optimized query performance.\n",
+    "\n",
+    "This interactive tutorial introduces you to the unique aspects of Druid 
SQL.\n",
+    "To learn about native queries, see [Native 
queries](https://druid.apache.org/docs/latest/querying/querying.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d6bbbcb",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Make sure that you meet the requirements outlined in the README.md file 
of the [apache/druid 
repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).\n",
+    "Specifically, you need the following:\n",
+    "- Knowledge of SQL\n",
+    "- [Python3](https://www.python.org/downloads/)\n",
+    "- [The `requests` package for 
Python](https://requests.readthedocs.io/en/latest/user/install/)\n",
+    "- [JupyterLab](https://jupyter.org/install#jupyterlab) (recommended) or 
[Jupyter Notebook](https://jupyter.org/install#jupyter-notebook) running on a 
non-default port. Druid and Jupyter both default to port `8888`, so you need to 
start Jupyter on a different port. \n",
+    "- An available Druid instance. This tutorial uses the `micro-quickstart` 
configuration described in the [Druid 
quickstart](https://druid.apache.org/docs/latest/tutorials/index.html), so no 
authentication or authorization is required unless explicitly mentioned. If you 
haven’t already, download Druid version 24.0 or higher and start Druid services 
as described in the quickstart."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f8e64f0-c29a-473c-8783-a2ff8648acd7",
+   "metadata": {},
+   "source": [
+    "## Prepare your environment\n",
+    "\n",
+    "This section contains the steps required to prepare your environment to 
follow along with this tutorial.\n",
+    "\n",
+    "Start by running the following cell. It imports the required Python 
packages and defines a variable for the Druid host."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b7f08a52",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In a distributed environment, use the Router service  as the 
`druid_host`. \n",
+    "druid_host = \"http://localhost:8888\"\n";,
+    "dataSourceName = \"wikipedia-sql-tutorial\"\n",
+    "print(f\"\\033[1mDruid host\\033[0m: {druid_host}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e893ef7d-7136-442f-8bd9-31b5a5276518",
+   "metadata": {},
+   "source": [
+    "In the rest of the tutorial, the `endpoint`, `http_method`, and `payload` 
variables are updated to accomplish different tasks.\n",
+    "\n",
+    "Run the following cell to ingest data from an external source into a 
table named `wikipedia-sql-tutorial` using the [multi-stage query (MSQ) task 
engine](https://druid.apache.org/docs/latest/multi-stage-query/index.html)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "045f782c-74d8-4447-9487-529071812b51",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql/task\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "\"query\": \"INSERT INTO \\\"wikipedia-sql-tutorial\\\" SELECT 
TIME_PARSE(\\\"timestamp\\\") \\\n",
+    "          AS __time, * FROM TABLE \\\n",
+    "          (EXTERN('{\\\"type\\\": \\\"http\\\", \\\"uris\\\": 
[\\\"https://druid.apache.org/data/wikipedia.json.gz\\\"]}', '{\\\"type\\\": 
\\\"json\\\"}', '[{\\\"name\\\": \\\"added\\\", \\\"type\\\": \\\"long\\\"}, 
{\\\"name\\\": \\\"channel\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"cityName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"comment\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"commentLength\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": 
\\\"countryIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"countryName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"deleted\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"delta\\\", 
\\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"deltaBucket\\\", \\\"type\\\": 
\\\"string\\\"}, {\\\"name\\\": \\\"diffUrl\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"flags\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"isAnonymous\\\", \\\"type\\\": \\\"string\\\"}, {\\\
 "name\\\": \\\"isMinor\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"isNew\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isRobot\\\", 
\\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isUnpatrolled\\\", 
\\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"metroCode\\\", \\\"type\\\": 
\\\"string\\\"}, {\\\"name\\\": \\\"namespace\\\", \\\"type\\\": 
\\\"string\\\"}, {\\\"name\\\": \\\"page\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"regionIsoCode\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"regionName\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"timestamp\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"user\\\", \\\"type\\\": \\\"string\\\"}]')) \\\n",
+    "          PARTITIONED BY DAY\",\n",
+    "  \"context\": {\n",
+    "    \"maxNumTasks\": 3\n",
+    "  }\n",
+    "})\n",
+    "\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "ingestion_taskId_response = response\n",
+    "ingestion_taskId = 
json.loads(ingestion_taskId_response.text)['taskId']\n",
+    "print(f\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(f\"\\nInserting data into the table named {dataSourceName}\")\n",
+    "print(\"\\nThe response includes the task ID and the status: \" + 
response.text + \".\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ceb86ce0-85f6-4c63-8fd6-883033ee96e9",
+   "metadata": {},
+   "source": [
+    "Wait for ingestion to complete before proceeding.\n",
+    "To check on the status of your ingestion task, run the following cell. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "df12d12c-a067-4759-bae0-0410c24b6205",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import time\n",
+    "\n",
+    "endpoint = f\"/druid/indexer/v1/task/{ingestion_taskId}/status\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"GET\"\n",
+    "\n",
+    "payload = {}\n",
+    "headers = {}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "ingestion_status = json.loads(response.text)['status']['status']\n",
+    "# If you only want to fetch the status once and print it, \n",
+    "# uncomment the print statement and comment out the if and while loops\n",
+    "# print(json.dumps(response.json(), indent=4))\n",
+    "\n",
+    "if ingestion_status == \"RUNNING\":\n",
+    "  print(\"The ingestion is running...\")\n",
+    "\n",
+    "while ingestion_status != \"SUCCESS\":\n",
+    "  response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "  ingestion_status = json.loads(response.text)['status']['status']\n",
+    "  time.sleep(15)  \n",
+    "  \n",
+    "if ingestion_status == \"SUCCESS\": \n",
+    "  print(\"The ingestion is complete:\")\n",
+    "  print(json.dumps(response.json(), indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "240b0ad5-48f2-4737-b12b-5fd5f98da300",
+   "metadata": {},
+   "source": [
+    "## Datasources\n",
+    "\n",
+    "Druid supports a variety of datasources, with the table datasource being 
the most common. In Druid documentation, the word \"datasource\" often 
implicitly refers to the table datasource.\n",
+    "The 
[Datasources](https://druid.apache.org/docs/latest/querying/datasource.html) 
topic provides a comprehensive overview of datasources supported by Druid 
SQL.\n",
+    "\n",
+    "In Druid SQL, table datasources reside in the `druid` schema. This is the 
default schema, so table datasources can be referenced as either 
`druid.dataSourceName` or `dataSourceName`.\n",
+    "\n",
+    "For example, run the next cell to return the rows of the column named 
`channel` from the `wikipedia-sql-tutorial` table. Because this tutorial is 
running in Jupyter, make sure to limit the size of your query results for 
display purposes using the SQL LIMIT clause."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "91dd255a-4d55-493e-a067-4cef5c659657",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT \\\"channel\\\" FROM \\\"wikipedia-sql-tutorial\\\" 
LIMIT 7\"})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(f\"\\nEach JSON object in the response represents a row in the 
{dataSourceName} datasource.\") \n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cbeb5a63",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Data types\n",
+    "\n",
+    "Druid maps SQL data types onto native types at query runtime.\n",
+    "The following native types are supported for Druid columns:\n",
+    "\n",
+    "* STRING: UTF-8 encoded strings and string arrays\n",
+    "* LONG: 64-bit signed int\n",
+    "* FLOAT: 32-bit float\n",
+    "* DOUBLE: 64-bit float\n",
+    "* COMPLEX: represents non-standard data types, such as nested JSON, 
hyperUnique and approxHistogram aggregators, and DataSketches aggregators\n",
+    "\n",
+    "Druid exposes table and column metadata through 
[INFORMATION_SCHEMA](https://druid.apache.org/docs/latest/querying/sql-metadata-tables.html#information-schema)
 tables. Run the following query to retrieve metadata for the 
`wikipedia-sql-tutorial` datasource. In the response body, each JSON object 
correlates to a column in the table.\n",
+    "Check the objects' `DATA_TYPE` property for SQL data types. You should 
see TIMESTAMP, BIGINT, and VARCHAR SQL data types. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b9227d6c-1d8c-4169-b13b-a08625c4011f",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE 
\\\"TABLE_SCHEMA\\\" = 'druid' AND \\\"TABLE_NAME\\\" = 
'wikipedia-sql-tutorial' LIMIT 7\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c59ca797-dd91-442b-8d02-67b711b3fcc6",
+   "metadata": {},
+   "source": [
+    "Druid natively interprets VARCHAR as STRING and BIGINT and TIMESTAMP SQL 
data types as LONG. For reference on how SQL data types map onto Druid native 
types, see [Standard 
types](https://druid.apache.org/docs/latest/querying/sql-data-types.html#standard-types).\n",
+    "\n",
+    "### Timestamp values\n",
+    "\n",
+    "Druid stores timestamp values as the number of milliseconds since the 
Unix epoch.\n",
+    "Primary timestamps are stored in a column named `__time`.\n",
+    "If a dataset doesn't have a timestamp, Druid uses the default value of 
`1970-01-01 00:00:00`.\n",
+    "\n",
+    "Druid time functions perform best when used with the `__time` column.\n",
+    "By default, time functions use the UTC time zone.\n",
+    "For more information about timestamp handling, see [Date and time 
functions](https://druid.apache.org/docs/latest/querying/sql-scalar.html#date-and-time-functions).\n",
+    "\n",
+    "Run the following cell to see Druid SQL `TIME_IN_INTERVAL` scalar 
function at work. This query checks whether a timestamp is contained within a 
specified interval. The results are grouped by date."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f7e3d62a-1325-4992-8bcd-c0f1925704bc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT TIME_IN_INTERVAL(__time, 
'2016-06-27T12:00:00/2016-06-27T13:00:00'), COUNT(*) FROM 
\\\"wikipedia-sql-tutorial\\\" \\\n",
+    "  GROUP BY TIME_IN_INTERVAL(__time, 
'2016-06-27T12:00:00/2016-06-27T13:00:00')\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7cfdfae-ccba-49ba-a70f-63d0bd3527b2",
+   "metadata": {},
+   "source": [
+    "### NULL values\n",
+    "\n",
+    "Druid supports SQL compatible NULL handling, allowing string columns to 
distinguish empty strings from NULL and numeric columns to contain NULL rows. 
To store and query data in SQL compatible mode, explicitly set the 
`useDefaultValueForNull` property to `false` in 
`_common/common.runtime.properties`. See [Configuration 
reference](https://druid.apache.org/docs/latest/configuration/index.html) for 
common configuration properties.\n",
+    "\n",
+    "When `useDefaultValueForNull` is set to `true` (default behavior), Druid 
stores NULL values as `0` for numeric columns and as `''` for string columns."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29c24856",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## SELECT statement syntax\n",
+    "\n",
+    "Druid SQL supports SELECT statements with the following structure:\n",
+    "\n",
+    "``` mysql\n",
+    "[ EXPLAIN PLAN FOR ]\n",
+    "[ WITH tableName [ ( column1, column2, ... ) ] AS ( query ) ]\n",
+    "SELECT [ ALL | DISTINCT ] { * | exprs }\n",
+    "FROM { <table> | (<subquery>) | <o1> [ INNER | LEFT ] JOIN <o2> ON 
condition }\n",
+    "[ WHERE expr ]\n",
+    "[ GROUP BY [ exprs | GROUPING SETS ( (exprs), ... ) | ROLLUP (exprs) | 
CUBE (exprs) ] ]\n",
+    "[ HAVING expr ]\n",
+    "[ ORDER BY expr [ ASC | DESC ], expr [ ASC | DESC ], ... ]\n",
+    "[ LIMIT limit ]\n",
+    "[ OFFSET offset ]\n",
+    "[ UNION ALL <another query> ]\n",
+    "```\n",
+    "\n",
+    "As a general rule, use the LIMIT clause with `SELECT *` to limit the 
number of rows returned. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8cf212e5-fb3f-4206-acdd-46ef1da327ab",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "## WHERE clause\n",
+    "\n",
+    "Druid SQL uses the [SQL WHERE 
clause](https://druid.apache.org/docs/latest/querying/sql.html#where) of a 
SELECT statement to fetch data based on a particular condition.\n",
+    "\n",
+    "In most cases, specifying a time range when using the WHERE clause 
improves query performance.\n",

Review Comment:
   Is it worth mentioning that it is also useful to include the dimensions used 
in partitioning? You can say that it is covered later.



##########
examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb:
##########
@@ -0,0 +1,765 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ad4e60b6",
+   "metadata": {
+    "deletable": true,
+    "editable": true,
+    "tags": []
+   },
+   "source": [
+    "# Tutorial: Learn the basics of Druid SQL\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "  \n",
+    "Apache Druid supports two query languages: Druid SQL and native 
queries.\n",
+    "Druid SQL is a Structured Query Language (SQL) dialect that enables you 
to query datasources in Apache Druid using SQL statements.\n",
+    "SQL and Druid SQL use similar syntax, with some notable differences.\n",
+    "Not all SQL functions are supported in Druid SQL. Instead, Druid includes 
Druid-specific SQL functions for optimized query performance.\n",
+    "\n",
+    "This interactive tutorial introduces you to the unique aspects of Druid 
SQL.\n",
+    "To learn about native queries, see [Native 
queries](https://druid.apache.org/docs/latest/querying/querying.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d6bbbcb",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Make sure that you meet the requirements outlined in the README.md file 
of the [apache/druid 
repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).\n",
+    "Specifically, you need the following:\n",
+    "- Knowledge of SQL\n",
+    "- [Python3](https://www.python.org/downloads/)\n",
+    "- [The `requests` package for 
Python](https://requests.readthedocs.io/en/latest/user/install/)\n",
+    "- [JupyterLab](https://jupyter.org/install#jupyterlab) (recommended) or 
[Jupyter Notebook](https://jupyter.org/install#jupyter-notebook) running on a 
non-default port. Druid and Jupyter both default to port `8888`, so you need to 
start Jupyter on a different port. \n",
+    "- An available Druid instance. This tutorial uses the `micro-quickstart` 
configuration described in the [Druid 
quickstart](https://druid.apache.org/docs/latest/tutorials/index.html), so no 
authentication or authorization is required unless explicitly mentioned. If you 
haven’t already, download Druid version 24.0 or higher and start Druid services 
as described in the quickstart."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f8e64f0-c29a-473c-8783-a2ff8648acd7",
+   "metadata": {},
+   "source": [
+    "## Prepare your environment\n",
+    "\n",
+    "This section contains the steps required to prepare your environment to 
follow along with this tutorial.\n",
+    "\n",
+    "Start by running the following cell. It imports the required Python 
packages and defines a variable for the Druid host."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b7f08a52",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In a distributed environment, use the Router service  as the 
`druid_host`. \n",
+    "druid_host = \"http://localhost:8888\"\n";,
+    "dataSourceName = \"wikipedia-sql-tutorial\"\n",
+    "print(f\"\\033[1mDruid host\\033[0m: {druid_host}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e893ef7d-7136-442f-8bd9-31b5a5276518",
+   "metadata": {},
+   "source": [
+    "In the rest of the tutorial, the `endpoint`, `http_method`, and `payload` 
variables are updated to accomplish different tasks.\n",
+    "\n",
+    "Run the following cell to ingest data from an external source into a 
table named `wikipedia-sql-tutorial` using the [multi-stage query (MSQ) task 
engine](https://druid.apache.org/docs/latest/multi-stage-query/index.html)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "045f782c-74d8-4447-9487-529071812b51",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql/task\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "\"query\": \"INSERT INTO \\\"wikipedia-sql-tutorial\\\" SELECT 
TIME_PARSE(\\\"timestamp\\\") \\\n",
+    "          AS __time, * FROM TABLE \\\n",
+    "          (EXTERN('{\\\"type\\\": \\\"http\\\", \\\"uris\\\": 
[\\\"https://druid.apache.org/data/wikipedia.json.gz\\\"]}', '{\\\"type\\\": 
\\\"json\\\"}', '[{\\\"name\\\": \\\"added\\\", \\\"type\\\": \\\"long\\\"}, 
{\\\"name\\\": \\\"channel\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"cityName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"comment\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"commentLength\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": 
\\\"countryIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"countryName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"deleted\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"delta\\\", 
\\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"deltaBucket\\\", \\\"type\\\": 
\\\"string\\\"}, {\\\"name\\\": \\\"diffUrl\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"flags\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"isAnonymous\\\", \\\"type\\\": \\\"string\\\"}, {\\\
 "name\\\": \\\"isMinor\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"isNew\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isRobot\\\", 
\\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isUnpatrolled\\\", 
\\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"metroCode\\\", \\\"type\\\": 
\\\"string\\\"}, {\\\"name\\\": \\\"namespace\\\", \\\"type\\\": 
\\\"string\\\"}, {\\\"name\\\": \\\"page\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"regionIsoCode\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"regionName\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"timestamp\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"user\\\", \\\"type\\\": \\\"string\\\"}]')) \\\n",
+    "          PARTITIONED BY DAY\",\n",
+    "  \"context\": {\n",
+    "    \"maxNumTasks\": 3\n",
+    "  }\n",
+    "})\n",
+    "\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "ingestion_taskId_response = response\n",
+    "ingestion_taskId = 
json.loads(ingestion_taskId_response.text)['taskId']\n",
+    "print(f\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(f\"\\nInserting data into the table named {dataSourceName}\")\n",
+    "print(\"\\nThe response includes the task ID and the status: \" + 
response.text + \".\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ceb86ce0-85f6-4c63-8fd6-883033ee96e9",
+   "metadata": {},
+   "source": [
+    "Wait for ingestion to complete before proceeding.\n",
+    "To check on the status of your ingestion task, run the following cell. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "df12d12c-a067-4759-bae0-0410c24b6205",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import time\n",
+    "\n",
+    "endpoint = f\"/druid/indexer/v1/task/{ingestion_taskId}/status\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"GET\"\n",
+    "\n",
+    "payload = {}\n",
+    "headers = {}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "ingestion_status = json.loads(response.text)['status']['status']\n",
+    "# If you only want to fetch the status once and print it, \n",
+    "# uncomment the print statement and comment out the if and while loops\n",
+    "# print(json.dumps(response.json(), indent=4))\n",
+    "\n",
+    "if ingestion_status == \"RUNNING\":\n",
+    "  print(\"The ingestion is running...\")\n",
+    "\n",
+    "while ingestion_status != \"SUCCESS\":\n",
+    "  response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "  ingestion_status = json.loads(response.text)['status']['status']\n",
+    "  time.sleep(15)  \n",
+    "  \n",
+    "if ingestion_status == \"SUCCESS\": \n",
+    "  print(\"The ingestion is complete:\")\n",
+    "  print(json.dumps(response.json(), indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "240b0ad5-48f2-4737-b12b-5fd5f98da300",
+   "metadata": {},
+   "source": [
+    "## Datasources\n",
+    "\n",
+    "Druid supports a variety of datasources, with the table datasource being 
the most common. In Druid documentation, the word \"datasource\" often 
implicitly refers to the table datasource.\n",
+    "The 
[Datasources](https://druid.apache.org/docs/latest/querying/datasource.html) 
topic provides a comprehensive overview of datasources supported by Druid 
SQL.\n",
+    "\n",
+    "In Druid SQL, table datasources reside in the `druid` schema. This is the 
default schema, so table datasources can be referenced as either 
`druid.dataSourceName` or `dataSourceName`.\n",
+    "\n",
+    "For example, run the next cell to return the rows of the column named 
`channel` from the `wikipedia-sql-tutorial` table. Because this tutorial is 
running in Jupyter, make sure to limit the size of your query results for 
display purposes using the SQL LIMIT clause."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "91dd255a-4d55-493e-a067-4cef5c659657",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT \\\"channel\\\" FROM \\\"wikipedia-sql-tutorial\\\" 
LIMIT 7\"})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(f\"\\nEach JSON object in the response represents a row in the 
{dataSourceName} datasource.\") \n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cbeb5a63",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Data types\n",
+    "\n",
+    "Druid maps SQL data types onto native types at query runtime.\n",
+    "The following native types are supported for Druid columns:\n",
+    "\n",
+    "* STRING: UTF-8 encoded strings and string arrays\n",
+    "* LONG: 64-bit signed int\n",
+    "* FLOAT: 32-bit float\n",
+    "* DOUBLE: 64-bit float\n",
+    "* COMPLEX: represents non-standard data types, such as nested JSON, 
hyperUnique and approxHistogram aggregators, and DataSketches aggregators\n",
+    "\n",
+    "Druid exposes table and column metadata through 
[INFORMATION_SCHEMA](https://druid.apache.org/docs/latest/querying/sql-metadata-tables.html#information-schema)
 tables. Run the following query to retrieve metadata for the 
`wikipedia-sql-tutorial` datasource. In the response body, each JSON object 
correlates to a column in the table.\n",
+    "Check the objects' `DATA_TYPE` property for SQL data types. You should 
see TIMESTAMP, BIGINT, and VARCHAR SQL data types. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b9227d6c-1d8c-4169-b13b-a08625c4011f",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE 
\\\"TABLE_SCHEMA\\\" = 'druid' AND \\\"TABLE_NAME\\\" = 
'wikipedia-sql-tutorial' LIMIT 7\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c59ca797-dd91-442b-8d02-67b711b3fcc6",
+   "metadata": {},
+   "source": [
+    "Druid natively interprets VARCHAR as STRING and BIGINT and TIMESTAMP SQL 
data types as LONG. For reference on how SQL data types map onto Druid native 
types, see [Standard 
types](https://druid.apache.org/docs/latest/querying/sql-data-types.html#standard-types).\n",
+    "\n",
+    "### Timestamp values\n",
+    "\n",
+    "Druid stores timestamp values as the number of milliseconds since the 
Unix epoch.\n",
+    "Primary timestamps are stored in a column named `__time`.\n",
+    "If a dataset doesn't have a timestamp, Druid uses the default value of 
`1970-01-01 00:00:00`.\n",
+    "\n",
+    "Druid time functions perform best when used with the `__time` column.\n",
+    "By default, time functions use the UTC time zone.\n",
+    "For more information about timestamp handling, see [Date and time 
functions](https://druid.apache.org/docs/latest/querying/sql-scalar.html#date-and-time-functions).\n",
+    "\n",
+    "Run the following cell to see Druid SQL `TIME_IN_INTERVAL` scalar 
function at work. This query checks whether a timestamp is contained within a 
specified interval. The results are grouped by date."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f7e3d62a-1325-4992-8bcd-c0f1925704bc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT TIME_IN_INTERVAL(__time, 
'2016-06-27T12:00:00/2016-06-27T13:00:00'), COUNT(*) FROM 
\\\"wikipedia-sql-tutorial\\\" \\\n",
+    "  GROUP BY TIME_IN_INTERVAL(__time, 
'2016-06-27T12:00:00/2016-06-27T13:00:00')\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7cfdfae-ccba-49ba-a70f-63d0bd3527b2",
+   "metadata": {},
+   "source": [
+    "### NULL values\n",
+    "\n",
+    "Druid supports SQL compatible NULL handling, allowing string columns to 
distinguish empty strings from NULL and numeric columns to contain NULL rows. 
To store and query data in SQL compatible mode, explicitly set the 
`useDefaultValueForNull` property to `false` in 
`_common/common.runtime.properties`. See [Configuration 
reference](https://druid.apache.org/docs/latest/configuration/index.html) for 
common configuration properties.\n",
+    "\n",
+    "When `useDefaultValueForNull` is set to `true` (default behavior), Druid 
stores NULL values as `0` for numeric columns and as `''` for string columns."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29c24856",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## SELECT statement syntax\n",
+    "\n",
+    "Druid SQL supports SELECT statements with the following structure:\n",
+    "\n",
+    "``` mysql\n",
+    "[ EXPLAIN PLAN FOR ]\n",
+    "[ WITH tableName [ ( column1, column2, ... ) ] AS ( query ) ]\n",
+    "SELECT [ ALL | DISTINCT ] { * | exprs }\n",
+    "FROM { <table> | (<subquery>) | <o1> [ INNER | LEFT ] JOIN <o2> ON 
condition }\n",
+    "[ WHERE expr ]\n",
+    "[ GROUP BY [ exprs | GROUPING SETS ( (exprs), ... ) | ROLLUP (exprs) | 
CUBE (exprs) ] ]\n",
+    "[ HAVING expr ]\n",
+    "[ ORDER BY expr [ ASC | DESC ], expr [ ASC | DESC ], ... ]\n",
+    "[ LIMIT limit ]\n",
+    "[ OFFSET offset ]\n",
+    "[ UNION ALL <another query> ]\n",
+    "```\n",
+    "\n",
+    "As a general rule, use the LIMIT clause with `SELECT *` to limit the 
number of rows returned. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8cf212e5-fb3f-4206-acdd-46ef1da327ab",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "## WHERE clause\n",
+    "\n",
+    "Druid SQL uses the [SQL WHERE 
clause](https://druid.apache.org/docs/latest/querying/sql.html#where) of a 
SELECT statement to fetch data based on a particular condition.\n",
+    "\n",
+    "In most cases, specifying a time range when using the WHERE clause 
improves query performance.\n",
+    "This is because Druid partitions data by time and having a time range 
allows Druid to skip over unrelated data.\n",
+    "\n",
+    "Druid supports range filtering on columns that contain long millisecond 
values, with the boundaries specified as ISO 8601 time intervals. This is 
suitable for the `__time` column, long metric columns, and dimensions with 
values that can be parsed as long milliseconds.\n",
+    "    \n",
+    "For example, the following cell uses a comparison operator on the 
`__time` field to filter results from a certain time range. The cell limits the 
results to seven rows for display purposes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ffca2962-da31-4b9c-adbc-882e35386916",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT * FROM \\\"wikipedia-sql-tutorial\\\" WHERE __time 
>= TIMESTAMP '2015-09-12 23:33:55' LIMIT 7\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5160db26-7e8d-40f7-8588-b7eabfc08355",
+   "metadata": {},
+   "source": [
+    "### Comparison operators\n",
+    "\n",
+    "Druid SQL supports the following comparison operators. You can use these 
operators in conjunction with the WHERE clause to compare expressions.\n",
+    "\n",
+    "- equal to (=)\n",
+    "- greater than(>)\n",
+    "- less than (<)\n",
+    "- greater than or equal (>=)\n",
+    "- less than or equal (<=)\n",
+    "- not equal to( <>)\n",
+    "\n",
+    "For example, the next cell returns the first seven records that match the 
following criteria:\n",
+    "- `cityName` is not an empty string\n",
+    "- `countryIsoCode` value equals to `US`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9afa74b9-ef9f-4b36-a7bb-88e4498a48ef",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT * FROM \\\"wikipedia-sql-tutorial\\\" WHERE 
\\\"cityName\\\" <> '' AND \\\"countryIsoCode\\\" = 'US' LIMIT 7\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd24470d-25c2-4031-a711-8477d69c9e94",
+   "metadata": {
+    "deletable": true,
+    "editable": true,
+    "tags": []
+   },
+   "source": [
+    "### Logical operators\n",
+    "\n",
+    "Druid's handling of logical operators is comparable to SQL with a few 
exceptions. For example, if an IN list contains NULL, the IN operator matches 
NULL values. This behavior is different from the SQL IN operator, which does 
not match NULL values. For a complete list of logical SQL operators supported 
by Druid SQL, see [Logical 
operators](https://druid.apache.org/docs/latest/querying/sql-operators.html#logical-operators)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2baf21b9-74d1-4df6-862f-afbaeef1812b",
+   "metadata": {},
+   "source": [
+    "## GROUP BY clause\n",
+    "\n",
+    "Druid SQL uses the [SQL GROUP 
BY](https://druid.apache.org/docs/latest/querying/sql.html#group-by) clause to 
separate items into groups, where each group is composed of rows with identical 
values. \n",
+    "The GROUP BY clause is often used with [aggregation 
functions](https://druid.apache.org/docs/latest/querying/sql-aggregations.html),
 such as COUNT or SUM, to produce summary values for each group.\n",
+    "\n",
+    "For example, the following cell counts all of the entries separated by 
the field `channel`. The output is limited to seven rows and has two fields: 
`channel` and `counts`. For each unique value of `channel`, Druid aggregates 
all rows having that value, counts the number of entries in the group, and 
assigns the results to a field called `counts`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "228ae0e4-355e-4b4d-8253-fb2e46715559",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT channel, COUNT(*) AS counts FROM 
\\\"wikipedia-sql-tutorial\\\" GROUP BY channel LIMIT 7\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eab67db4-a0f3-4177-b5be-1fef355bf33f",
+   "metadata": {},
+   "source": [
+    "You can further define the groups by specifying multiple dimensions.\n",
+    "Druid SQL supports using numbers in GROUP BY and ORDER BY clauses to 
refer to column positions in the SELECT clause.\n",
+    "Similar to SQL, Druid SQL uses one-based indexing to reference elements 
in SQL statements.\n",
+    "\n",
+    "For example, the next cell aggregates entries grouped by fields 
`cityName` and `countryName`.\n",
+    "The output has three fields: `cityName`, `countryName`, and `counts`. For 
each unique combination of `cityName` and `countryName`, Druid aggregates all 
rows and averages the entries in the group.\n",
+    "The output is limited to seven rows for display purposes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ca4b3bae-2b02-4c90-98a5-cb7806b6e649",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT cityName, countryName, COUNT(*) AS counts FROM 
\\\"wikipedia-sql-tutorial\\\" GROUP BY 1, 2 LIMIT 7\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c2e56af-3fdf-40f1-869b-822ac8aafbc8",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "## Query types\n",
+    "\n",
+    "Druid uses the [Apache Calcite](https://calcite.apache.org/) data 
management framework to translate SQL statements into Druid’s [native 
JSON-based 
queries](https://druid.apache.org/docs/latest/querying/querying.html) on the 
Broker. The queries are then executed by the Druid cluster.\n",
+    "Native queries are low level, designed to be lightweight and complete 
very quickly.\n",
+    "\n",
+    "Druid SQL uses the following native query types:\n",
+    "- Scan\n",
+    "- Timeseries\n",
+    "- TopN\n",
+    "- GroupBy\n",
+    "\n",
+    "To get information about how a Druid SQL query is translated into a 
native query type, add [EXPLAIN PLAN 
FOR](https://druid.apache.org/docs/latest/querying/sql.html#explain-plan) to 
the beginning of the query.\n",
+    "Alternatively, you can set up [request 
logging](https://druid.apache.org/docs/latest/configuration/index.html#request-logging)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2de6e6ba-473c-4ef0-9739-a9472a4c7065",
+   "metadata": {},
+   "source": [
+    "### Scan\n",
+    "\n",
+    "The Scan native query type returns raw Druid rows in streaming mode.\n",
+    "Druid SQL uses the Scan query type for queries that do not 
aggregate&mdash;queries that do not have GROUP BY or DISTINCT clauses.\n",
+    "\n",
+    "For example, run the next cell to scan the `wikipedia-sql-tutorial` table 
for comments from Mexico City. Calcite translates this Druid SQL query into the 
Scan native query type."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "83300261-b0c6-4104-b42b-8b1f9d15aa56",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT comment AS \\\"Entry\\\" FROM 
\\\"wikipedia-sql-tutorial\\\" WHERE cityName = 'Mexico City' LIMIT 7\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "407cf489-3947-4326-9d81-18b38abaee58",
+   "metadata": {},
+   "source": [
+    "### TopN\n",
+    "\n",
+    "The TopN native query type returns a sorted set of results for the values 
in a given dimension according to some criteria. TopN results are always 
computed in memory. In some cases, the TopN query type delivers approximate 
ranking and results. To prevent this, set `useApproximateTopN` to `false`.\n",
+    "\n",
+    "Druid SQL uses TopN for queries that meet the following criteria:\n",
+    "- queries that GROUP BY a single expression\n",
+    "- queries that have ORDER BY and LIMIT clauses\n",
+    "- queries that do not contain HAVING\n",
+    "- queries that are not nested\n",
+    "\n",
+    "For example, the next cell returns the channels based on the number of 
events for each one in ascending order. Calcite translates this Druid SQL query 
into the TopN native query type."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fdb94a64-09d1-4f3d-981c-ef805f34b175",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT channel, count(*) as \\\"Number of events\\\" FROM 
\\\"wikipedia-sql-tutorial\\\" GROUP BY channel ORDER BY 2 ASC LIMIT 5\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "38ee7d1d-2d47-4d36-b8c6-b0868c36c871",
+   "metadata": {},
+   "source": [
+    "### Timeseries\n",
+    "\n",
+    "The Timeseries native query type returns an array of JSON objects, where 
each object represents a value asked for by the Timeseries query.\n",
+    "\n",
+    "Druid SQL uses Timeseries for queries that meet the following 
criteria:\n",
+    "- queries that GROUP BY `FLOOR(__time TO unit)` or `TIME_FLOOR(__time, 
period)`\n",
+    "- queries that do not contain other grouping expressions\n",
+    "- queries that do not contain HAVING\n",
+    "- queries that are not nested\n",
+    "- queries that either have no ORDER BY clause or an ORDER BY clause that 
orders by the same expression as present in GROUP BY\n",
+    "\n",
+    "For example, Calcite translates the following Druid SQL query into the 
Timeseries native query type:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "19a6ab44-6349-4620-82c7-00997287378a",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT countryName AS \\\"Country\\\", SUM(deleted), 
SUM(added) FROM \\\"wikipedia-sql-tutorial\\\" \\\n",

Review Comment:
    Alias `SUM(deleted)` and  `SUM(added)` for example `SUM(deleted) AS deleted`



##########
examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb:
##########
@@ -0,0 +1,765 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ad4e60b6",
+   "metadata": {
+    "deletable": true,
+    "editable": true,
+    "tags": []
+   },
+   "source": [
+    "# Tutorial: Learn the basics of Druid SQL\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "  \n",
+    "Apache Druid supports two query languages: Druid SQL and native 
queries.\n",
+    "Druid SQL is a Structured Query Language (SQL) dialect that enables you 
to query datasources in Apache Druid using SQL statements.\n",
+    "SQL and Druid SQL use similar syntax, with some notable differences.\n",
+    "Not all SQL functions are supported in Druid SQL. Instead, Druid includes 
Druid-specific SQL functions for optimized query performance.\n",
+    "\n",
+    "This interactive tutorial introduces you to the unique aspects of Druid 
SQL.\n",
+    "To learn about native queries, see [Native 
queries](https://druid.apache.org/docs/latest/querying/querying.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d6bbbcb",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Make sure that you meet the requirements outlined in the README.md file 
of the [apache/druid 
repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).\n",
+    "Specifically, you need the following:\n",
+    "- Knowledge of SQL\n",
+    "- [Python3](https://www.python.org/downloads/)\n",
+    "- [The `requests` package for 
Python](https://requests.readthedocs.io/en/latest/user/install/)\n",
+    "- [JupyterLab](https://jupyter.org/install#jupyterlab) (recommended) or 
[Jupyter Notebook](https://jupyter.org/install#jupyter-notebook) running on a 
non-default port. Druid and Jupyter both default to port `8888`, so you need to 
start Jupyter on a different port. \n",
+    "- An available Druid instance. This tutorial uses the `micro-quickstart` 
configuration described in the [Druid 
quickstart](https://druid.apache.org/docs/latest/tutorials/index.html), so no 
authentication or authorization is required unless explicitly mentioned. If you 
haven’t already, download Druid version 24.0 or higher and start Druid services 
as described in the quickstart."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f8e64f0-c29a-473c-8783-a2ff8648acd7",
+   "metadata": {},
+   "source": [
+    "## Prepare your environment\n",
+    "\n",
+    "This section contains the steps required to prepare your environment to 
follow along with this tutorial.\n",
+    "\n",
+    "Start by running the following cell. It imports the required Python 
packages and defines a variable for the Druid host."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b7f08a52",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In a distributed environment, use the Router service  as the 
`druid_host`. \n",
+    "druid_host = \"http://localhost:8888\"\n";,
+    "dataSourceName = \"wikipedia-sql-tutorial\"\n",
+    "print(f\"\\033[1mDruid host\\033[0m: {druid_host}\")"

Review Comment:
   I like the way that this bold's the text, but it makes the code a little 
hard to read. Also if you are going to reuse the format frequently, you can 
define it in a variable:
   ```
   bold = '\033[1m'
   standard = '\033[0m'
   print(f"{bold}Druid host{standard}: {druid_host}")
   ```



##########
examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb:
##########
@@ -0,0 +1,765 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ad4e60b6",
+   "metadata": {
+    "deletable": true,
+    "editable": true,
+    "tags": []
+   },
+   "source": [
+    "# Tutorial: Learn the basics of Druid SQL\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "  \n",
+    "Apache Druid supports two query languages: Druid SQL and native 
queries.\n",
+    "Druid SQL is a Structured Query Language (SQL) dialect that enables you 
to query datasources in Apache Druid using SQL statements.\n",
+    "SQL and Druid SQL use similar syntax, with some notable differences.\n",
+    "Not all SQL functions are supported in Druid SQL. Instead, Druid includes 
Druid-specific SQL functions for optimized query performance.\n",
+    "\n",
+    "This interactive tutorial introduces you to the unique aspects of Druid 
SQL.\n",
+    "To learn about native queries, see [Native 
queries](https://druid.apache.org/docs/latest/querying/querying.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d6bbbcb",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Make sure that you meet the requirements outlined in the README.md file 
of the [apache/druid 
repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).\n",
+    "Specifically, you need the following:\n",
+    "- Knowledge of SQL\n",
+    "- [Python3](https://www.python.org/downloads/)\n",
+    "- [The `requests` package for 
Python](https://requests.readthedocs.io/en/latest/user/install/)\n",
+    "- [JupyterLab](https://jupyter.org/install#jupyterlab) (recommended) or 
[Jupyter Notebook](https://jupyter.org/install#jupyter-notebook) running on a 
non-default port. Druid and Jupyter both default to port `8888`, so you need to 
start Jupyter on a different port. \n",
+    "- An available Druid instance. This tutorial uses the `micro-quickstart` 
configuration described in the [Druid 
quickstart](https://druid.apache.org/docs/latest/tutorials/index.html), so no 
authentication or authorization is required unless explicitly mentioned. If you 
haven’t already, download Druid version 24.0 or higher and start Druid services 
as described in the quickstart."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f8e64f0-c29a-473c-8783-a2ff8648acd7",
+   "metadata": {},
+   "source": [
+    "## Prepare your environment\n",
+    "\n",
+    "This section contains the steps required to prepare your environment to 
follow along with this tutorial.\n",
+    "\n",
+    "Start by running the following cell. It imports the required Python 
packages and defines a variable for the Druid host."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b7f08a52",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In a distributed environment, use the Router service  as the 
`druid_host`. \n",
+    "druid_host = \"http://localhost:8888\"\n";,
+    "dataSourceName = \"wikipedia-sql-tutorial\"\n",
+    "print(f\"\\033[1mDruid host\\033[0m: {druid_host}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e893ef7d-7136-442f-8bd9-31b5a5276518",
+   "metadata": {},
+   "source": [
+    "In the rest of the tutorial, the `endpoint`, `http_method`, and `payload` 
variables are updated to accomplish different tasks.\n",
+    "\n",
+    "Run the following cell to ingest data from an external source into a 
table named `wikipedia-sql-tutorial` using the [multi-stage query (MSQ) task 
engine](https://druid.apache.org/docs/latest/multi-stage-query/index.html)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "045f782c-74d8-4447-9487-529071812b51",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql/task\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "\"query\": \"INSERT INTO \\\"wikipedia-sql-tutorial\\\" SELECT 
TIME_PARSE(\\\"timestamp\\\") \\\n",
+    "          AS __time, * FROM TABLE \\\n",
+    "          (EXTERN('{\\\"type\\\": \\\"http\\\", \\\"uris\\\": 
[\\\"https://druid.apache.org/data/wikipedia.json.gz\\\"]}', '{\\\"type\\\": 
\\\"json\\\"}', '[{\\\"name\\\": \\\"added\\\", \\\"type\\\": \\\"long\\\"}, 
{\\\"name\\\": \\\"channel\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"cityName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"comment\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"commentLength\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": 
\\\"countryIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"countryName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"deleted\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"delta\\\", 
\\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"deltaBucket\\\", \\\"type\\\": 
\\\"string\\\"}, {\\\"name\\\": \\\"diffUrl\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"flags\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"isAnonymous\\\", \\\"type\\\": \\\"string\\\"}, {\\\
 "name\\\": \\\"isMinor\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"isNew\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isRobot\\\", 
\\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isUnpatrolled\\\", 
\\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"metroCode\\\", \\\"type\\\": 
\\\"string\\\"}, {\\\"name\\\": \\\"namespace\\\", \\\"type\\\": 
\\\"string\\\"}, {\\\"name\\\": \\\"page\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"regionIsoCode\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"regionName\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"timestamp\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"user\\\", \\\"type\\\": \\\"string\\\"}]')) \\\n",
+    "          PARTITIONED BY DAY\",\n",
+    "  \"context\": {\n",
+    "    \"maxNumTasks\": 3\n",
+    "  }\n",
+    "})\n",
+    "\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "ingestion_taskId_response = response\n",
+    "ingestion_taskId = 
json.loads(ingestion_taskId_response.text)['taskId']\n",
+    "print(f\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(f\"\\nInserting data into the table named {dataSourceName}\")\n",
+    "print(\"\\nThe response includes the task ID and the status: \" + 
response.text + \".\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ceb86ce0-85f6-4c63-8fd6-883033ee96e9",
+   "metadata": {},
+   "source": [
+    "Wait for ingestion to complete before proceeding.\n",
+    "To check on the status of your ingestion task, run the following cell. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "df12d12c-a067-4759-bae0-0410c24b6205",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import time\n",
+    "\n",
+    "endpoint = f\"/druid/indexer/v1/task/{ingestion_taskId}/status\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"GET\"\n",
+    "\n",
+    "payload = {}\n",
+    "headers = {}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "ingestion_status = json.loads(response.text)['status']['status']\n",
+    "# If you only want to fetch the status once and print it, \n",
+    "# uncomment the print statement and comment out the if and while loops\n",
+    "# print(json.dumps(response.json(), indent=4))\n",
+    "\n",
+    "if ingestion_status == \"RUNNING\":\n",
+    "  print(\"The ingestion is running...\")\n",
+    "\n",
+    "while ingestion_status != \"SUCCESS\":\n",
+    "  response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "  ingestion_status = json.loads(response.text)['status']['status']\n",
+    "  time.sleep(15)  \n",
+    "  \n",
+    "if ingestion_status == \"SUCCESS\": \n",
+    "  print(\"The ingestion is complete:\")\n",
+    "  print(json.dumps(response.json(), indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "240b0ad5-48f2-4737-b12b-5fd5f98da300",
+   "metadata": {},
+   "source": [
+    "## Datasources\n",
+    "\n",
+    "Druid supports a variety of datasources, with the table datasource being 
the most common. In Druid documentation, the word \"datasource\" often 
implicitly refers to the table datasource.\n",
+    "The 
[Datasources](https://druid.apache.org/docs/latest/querying/datasource.html) 
topic provides a comprehensive overview of datasources supported by Druid 
SQL.\n",
+    "\n",
+    "In Druid SQL, table datasources reside in the `druid` schema. This is the 
default schema, so table datasources can be referenced as either 
`druid.dataSourceName` or `dataSourceName`.\n",
+    "\n",
+    "For example, run the next cell to return the rows of the column named 
`channel` from the `wikipedia-sql-tutorial` table. Because this tutorial is 
running in Jupyter, make sure to limit the size of your query results for 
display purposes using the SQL LIMIT clause."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "91dd255a-4d55-493e-a067-4cef5c659657",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT \\\"channel\\\" FROM \\\"wikipedia-sql-tutorial\\\" 
LIMIT 7\"})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(f\"\\nEach JSON object in the response represents a row in the 
{dataSourceName} datasource.\") \n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cbeb5a63",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Data types\n",
+    "\n",
+    "Druid maps SQL data types onto native types at query runtime.\n",
+    "The following native types are supported for Druid columns:\n",
+    "\n",
+    "* STRING: UTF-8 encoded strings and string arrays\n",
+    "* LONG: 64-bit signed int\n",
+    "* FLOAT: 32-bit float\n",
+    "* DOUBLE: 64-bit float\n",
+    "* COMPLEX: represents non-standard data types, such as nested JSON, 
hyperUnique and approxHistogram aggregators, and DataSketches aggregators\n",
+    "\n",
+    "Druid exposes table and column metadata through 
[INFORMATION_SCHEMA](https://druid.apache.org/docs/latest/querying/sql-metadata-tables.html#information-schema)
 tables. Run the following query to retrieve metadata for the 
`wikipedia-sql-tutorial` datasource. In the response body, each JSON object 
correlates to a column in the table.\n",
+    "Check the objects' `DATA_TYPE` property for SQL data types. You should 
see TIMESTAMP, BIGINT, and VARCHAR SQL data types. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b9227d6c-1d8c-4169-b13b-a08625c4011f",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE 
\\\"TABLE_SCHEMA\\\" = 'druid' AND \\\"TABLE_NAME\\\" = 
'wikipedia-sql-tutorial' LIMIT 7\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c59ca797-dd91-442b-8d02-67b711b3fcc6",
+   "metadata": {},
+   "source": [
+    "Druid natively interprets VARCHAR as STRING and BIGINT and TIMESTAMP SQL 
data types as LONG. For reference on how SQL data types map onto Druid native 
types, see [Standard 
types](https://druid.apache.org/docs/latest/querying/sql-data-types.html#standard-types).\n",
+    "\n",
+    "### Timestamp values\n",
+    "\n",
+    "Druid stores timestamp values as the number of milliseconds since the 
Unix epoch.\n",
+    "Primary timestamps are stored in a column named `__time`.\n",
+    "If a dataset doesn't have a timestamp, Druid uses the default value of 
`1970-01-01 00:00:00`.\n",
+    "\n",
+    "Druid time functions perform best when used with the `__time` column.\n",
+    "By default, time functions use the UTC time zone.\n",
+    "For more information about timestamp handling, see [Date and time 
functions](https://druid.apache.org/docs/latest/querying/sql-scalar.html#date-and-time-functions).\n",
+    "\n",
+    "Run the following cell to see Druid SQL `TIME_IN_INTERVAL` scalar 
function at work. This query checks whether a timestamp is contained within a 
specified interval. The results are grouped by date."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f7e3d62a-1325-4992-8bcd-c0f1925704bc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT TIME_IN_INTERVAL(__time, 
'2016-06-27T12:00:00/2016-06-27T13:00:00'), COUNT(*) FROM 
\\\"wikipedia-sql-tutorial\\\" \\\n",

Review Comment:
   as a general practice, let's alias our columns so that the output is more 
informative. The current output is this:
   ```
   Response: 
   [
       {
           "EXPR$0": false,
           "EXPR$1": 23214
       },
       {
           "EXPR$0": true,
           "EXPR$1": 1219
       }
   ]
   ```
   You can change the SELEXT to alias the columns:
   ```
   SELECT TIME_IN_INTERVAL(__time, '2016-06-27T12:00:00/2016-06-27T13:00:00') 
AS existing, COUNT(*) AS total 
   ```
   Then the output has column aliases like this:
   ```
   Response: 
   [
       {
           "existing": false,
           "total": 23214
       },
       {
           "existing": true,
           "total": 1219
       }
   ]
   ```
   
   



##########
examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb:
##########
@@ -0,0 +1,765 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ad4e60b6",
+   "metadata": {
+    "deletable": true,
+    "editable": true,
+    "tags": []
+   },
+   "source": [
+    "# Tutorial: Learn the basics of Druid SQL\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "  \n",
+    "Apache Druid supports two query languages: Druid SQL and native 
queries.\n",
+    "Druid SQL is a Structured Query Language (SQL) dialect that enables you 
to query datasources in Apache Druid using SQL statements.\n",
+    "SQL and Druid SQL use similar syntax, with some notable differences.\n",
+    "Not all SQL functions are supported in Druid SQL. Instead, Druid includes 
Druid-specific SQL functions for optimized query performance.\n",
+    "\n",
+    "This interactive tutorial introduces you to the unique aspects of Druid 
SQL.\n",
+    "To learn about native queries, see [Native 
queries](https://druid.apache.org/docs/latest/querying/querying.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d6bbbcb",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Make sure that you meet the requirements outlined in the README.md file 
of the [apache/druid 
repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).\n",
+    "Specifically, you need the following:\n",
+    "- Knowledge of SQL\n",
+    "- [Python3](https://www.python.org/downloads/)\n",
+    "- [The `requests` package for 
Python](https://requests.readthedocs.io/en/latest/user/install/)\n",
+    "- [JupyterLab](https://jupyter.org/install#jupyterlab) (recommended) or 
[Jupyter Notebook](https://jupyter.org/install#jupyter-notebook) running on a 
non-default port. Druid and Jupyter both default to port `8888`, so you need to 
start Jupyter on a different port. \n",
+    "- An available Druid instance. This tutorial uses the `micro-quickstart` 
configuration described in the [Druid 
quickstart](https://druid.apache.org/docs/latest/tutorials/index.html), so no 
authentication or authorization is required unless explicitly mentioned. If you 
haven’t already, download Druid version 24.0 or higher and start Druid services 
as described in the quickstart."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f8e64f0-c29a-473c-8783-a2ff8648acd7",
+   "metadata": {},
+   "source": [
+    "## Prepare your environment\n",
+    "\n",
+    "This section contains the steps required to prepare your environment to 
follow along with this tutorial.\n",
+    "\n",
+    "Start by running the following cell. It imports the required Python 
packages and defines a variable for the Druid host."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b7f08a52",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In a distributed environment, use the Router service  as the 
`druid_host`. \n",
+    "druid_host = \"http://localhost:8888\"\n";,
+    "dataSourceName = \"wikipedia-sql-tutorial\"\n",
+    "print(f\"\\033[1mDruid host\\033[0m: {druid_host}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e893ef7d-7136-442f-8bd9-31b5a5276518",
+   "metadata": {},
+   "source": [
+    "In the rest of the tutorial, the `endpoint`, `http_method`, and `payload` 
variables are updated to accomplish different tasks.\n",
+    "\n",
+    "Run the following cell to ingest data from an external source into a 
table named `wikipedia-sql-tutorial` using the [multi-stage query (MSQ) task 
engine](https://druid.apache.org/docs/latest/multi-stage-query/index.html)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "045f782c-74d8-4447-9487-529071812b51",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql/task\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "\"query\": \"INSERT INTO \\\"wikipedia-sql-tutorial\\\" SELECT 
TIME_PARSE(\\\"timestamp\\\") \\\n",
+    "          AS __time, * FROM TABLE \\\n",
+    "          (EXTERN('{\\\"type\\\": \\\"http\\\", \\\"uris\\\": 
[\\\"https://druid.apache.org/data/wikipedia.json.gz\\\"]}', '{\\\"type\\\": 
\\\"json\\\"}', '[{\\\"name\\\": \\\"added\\\", \\\"type\\\": \\\"long\\\"}, 
{\\\"name\\\": \\\"channel\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"cityName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"comment\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"commentLength\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": 
\\\"countryIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"countryName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"deleted\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"delta\\\", 
\\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"deltaBucket\\\", \\\"type\\\": 
\\\"string\\\"}, {\\\"name\\\": \\\"diffUrl\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"flags\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"isAnonymous\\\", \\\"type\\\": \\\"string\\\"}, {\\\
 "name\\\": \\\"isMinor\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"isNew\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isRobot\\\", 
\\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isUnpatrolled\\\", 
\\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"metroCode\\\", \\\"type\\\": 
\\\"string\\\"}, {\\\"name\\\": \\\"namespace\\\", \\\"type\\\": 
\\\"string\\\"}, {\\\"name\\\": \\\"page\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"regionIsoCode\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"regionName\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"timestamp\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"user\\\", \\\"type\\\": \\\"string\\\"}]')) \\\n",
+    "          PARTITIONED BY DAY\",\n",
+    "  \"context\": {\n",
+    "    \"maxNumTasks\": 3\n",
+    "  }\n",
+    "})\n",
+    "\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "ingestion_taskId_response = response\n",
+    "ingestion_taskId = 
json.loads(ingestion_taskId_response.text)['taskId']\n",
+    "print(f\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(f\"\\nInserting data into the table named {dataSourceName}\")\n",
+    "print(\"\\nThe response includes the task ID and the status: \" + 
response.text + \".\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ceb86ce0-85f6-4c63-8fd6-883033ee96e9",
+   "metadata": {},
+   "source": [
+    "Wait for ingestion to complete before proceeding.\n",
+    "To check on the status of your ingestion task, run the following cell. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "df12d12c-a067-4759-bae0-0410c24b6205",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import time\n",
+    "\n",
+    "endpoint = f\"/druid/indexer/v1/task/{ingestion_taskId}/status\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"GET\"\n",
+    "\n",
+    "payload = {}\n",
+    "headers = {}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "ingestion_status = json.loads(response.text)['status']['status']\n",
+    "# If you only want to fetch the status once and print it, \n",
+    "# uncomment the print statement and comment out the if and while loops\n",
+    "# print(json.dumps(response.json(), indent=4))\n",
+    "\n",
+    "if ingestion_status == \"RUNNING\":\n",
+    "  print(\"The ingestion is running...\")\n",
+    "\n",
+    "while ingestion_status != \"SUCCESS\":\n",
+    "  response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "  ingestion_status = json.loads(response.text)['status']['status']\n",
+    "  time.sleep(15)  \n",
+    "  \n",
+    "if ingestion_status == \"SUCCESS\": \n",
+    "  print(\"The ingestion is complete:\")\n",
+    "  print(json.dumps(response.json(), indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "240b0ad5-48f2-4737-b12b-5fd5f98da300",
+   "metadata": {},
+   "source": [
+    "## Datasources\n",
+    "\n",
+    "Druid supports a variety of datasources, with the table datasource being 
the most common. In Druid documentation, the word \"datasource\" often 
implicitly refers to the table datasource.\n",
+    "The 
[Datasources](https://druid.apache.org/docs/latest/querying/datasource.html) 
topic provides a comprehensive overview of datasources supported by Druid 
SQL.\n",
+    "\n",
+    "In Druid SQL, table datasources reside in the `druid` schema. This is the 
default schema, so table datasources can be referenced as either 
`druid.dataSourceName` or `dataSourceName`.\n",
+    "\n",
+    "For example, run the next cell to return the rows of the column named 
`channel` from the `wikipedia-sql-tutorial` table. Because this tutorial is 
running in Jupyter, make sure to limit the size of your query results for 
display purposes using the SQL LIMIT clause."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "91dd255a-4d55-493e-a067-4cef5c659657",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT \\\"channel\\\" FROM \\\"wikipedia-sql-tutorial\\\" 
LIMIT 7\"})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(f\"\\nEach JSON object in the response represents a row in the 
{dataSourceName} datasource.\") \n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cbeb5a63",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Data types\n",
+    "\n",
+    "Druid maps SQL data types onto native types at query runtime.\n",
+    "The following native types are supported for Druid columns:\n",
+    "\n",
+    "* STRING: UTF-8 encoded strings and string arrays\n",
+    "* LONG: 64-bit signed int\n",
+    "* FLOAT: 32-bit float\n",
+    "* DOUBLE: 64-bit float\n",
+    "* COMPLEX: represents non-standard data types, such as nested JSON, 
hyperUnique and approxHistogram aggregators, and DataSketches aggregators\n",
+    "\n",
+    "Druid exposes table and column metadata through 
[INFORMATION_SCHEMA](https://druid.apache.org/docs/latest/querying/sql-metadata-tables.html#information-schema)
 tables. Run the following query to retrieve metadata for the 
`wikipedia-sql-tutorial` datasource. In the response body, each JSON object 
correlates to a column in the table.\n",
+    "Check the objects' `DATA_TYPE` property for SQL data types. You should 
see TIMESTAMP, BIGINT, and VARCHAR SQL data types. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b9227d6c-1d8c-4169-b13b-a08625c4011f",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE 
\\\"TABLE_SCHEMA\\\" = 'druid' AND \\\"TABLE_NAME\\\" = 
'wikipedia-sql-tutorial' LIMIT 7\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c59ca797-dd91-442b-8d02-67b711b3fcc6",
+   "metadata": {},
+   "source": [
+    "Druid natively interprets VARCHAR as STRING and BIGINT and TIMESTAMP SQL 
data types as LONG. For reference on how SQL data types map onto Druid native 
types, see [Standard 
types](https://druid.apache.org/docs/latest/querying/sql-data-types.html#standard-types).\n",
+    "\n",
+    "### Timestamp values\n",
+    "\n",
+    "Druid stores timestamp values as the number of milliseconds since the 
Unix epoch.\n",
+    "Primary timestamps are stored in a column named `__time`.\n",
+    "If a dataset doesn't have a timestamp, Druid uses the default value of 
`1970-01-01 00:00:00`.\n",
+    "\n",
+    "Druid time functions perform best when used with the `__time` column.\n",
+    "By default, time functions use the UTC time zone.\n",
+    "For more information about timestamp handling, see [Date and time 
functions](https://druid.apache.org/docs/latest/querying/sql-scalar.html#date-and-time-functions).\n",
+    "\n",
+    "Run the following cell to see Druid SQL `TIME_IN_INTERVAL` scalar 
function at work. This query checks whether a timestamp is contained within a 
specified interval. The results are grouped by date."

Review Comment:
   Is this to check the number of records within the time interval? Or maybe if 
the timestamp for a record exists within a time interval? It is a little 
confusing the way that it is worded for me now



##########
examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb:
##########
@@ -0,0 +1,765 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ad4e60b6",
+   "metadata": {
+    "deletable": true,
+    "editable": true,
+    "tags": []
+   },
+   "source": [
+    "# Tutorial: Learn the basics of Druid SQL\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "  \n",
+    "Apache Druid supports two query languages: Druid SQL and native 
queries.\n",
+    "Druid SQL is a Structured Query Language (SQL) dialect that enables you 
to query datasources in Apache Druid using SQL statements.\n",
+    "SQL and Druid SQL use similar syntax, with some notable differences.\n",
+    "Not all SQL functions are supported in Druid SQL. Instead, Druid includes 
Druid-specific SQL functions for optimized query performance.\n",
+    "\n",
+    "This interactive tutorial introduces you to the unique aspects of Druid 
SQL.\n",
+    "To learn about native queries, see [Native 
queries](https://druid.apache.org/docs/latest/querying/querying.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d6bbbcb",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Make sure that you meet the requirements outlined in the README.md file 
of the [apache/druid 
repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).\n",
+    "Specifically, you need the following:\n",
+    "- Knowledge of SQL\n",
+    "- [Python3](https://www.python.org/downloads/)\n",
+    "- [The `requests` package for 
Python](https://requests.readthedocs.io/en/latest/user/install/)\n",
+    "- [JupyterLab](https://jupyter.org/install#jupyterlab) (recommended) or 
[Jupyter Notebook](https://jupyter.org/install#jupyter-notebook) running on a 
non-default port. Druid and Jupyter both default to port `8888`, so you need to 
start Jupyter on a different port. \n",
+    "- An available Druid instance. This tutorial uses the `micro-quickstart` 
configuration described in the [Druid 
quickstart](https://druid.apache.org/docs/latest/tutorials/index.html), so no 
authentication or authorization is required unless explicitly mentioned. If you 
haven’t already, download Druid version 24.0 or higher and start Druid services 
as described in the quickstart."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f8e64f0-c29a-473c-8783-a2ff8648acd7",
+   "metadata": {},
+   "source": [
+    "## Prepare your environment\n",
+    "\n",
+    "This section contains the steps required to prepare your environment to 
follow along with this tutorial.\n",
+    "\n",
+    "Start by running the following cell. It imports the required Python 
packages and defines a variable for the Druid host."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b7f08a52",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In a distributed environment, use the Router service  as the 
`druid_host`. \n",
+    "druid_host = \"http://localhost:8888\"\n";,
+    "dataSourceName = \"wikipedia-sql-tutorial\"\n",
+    "print(f\"\\033[1mDruid host\\033[0m: {druid_host}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e893ef7d-7136-442f-8bd9-31b5a5276518",
+   "metadata": {},
+   "source": [
+    "In the rest of the tutorial, the `endpoint`, `http_method`, and `payload` 
variables are updated to accomplish different tasks.\n",
+    "\n",
+    "Run the following cell to ingest data from an external source into a 
table named `wikipedia-sql-tutorial` using the [multi-stage query (MSQ) task 
engine](https://druid.apache.org/docs/latest/multi-stage-query/index.html)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "045f782c-74d8-4447-9487-529071812b51",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql/task\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "\"query\": \"INSERT INTO \\\"wikipedia-sql-tutorial\\\" SELECT 
TIME_PARSE(\\\"timestamp\\\") \\\n",
+    "          AS __time, * FROM TABLE \\\n",
+    "          (EXTERN('{\\\"type\\\": \\\"http\\\", \\\"uris\\\": 
[\\\"https://druid.apache.org/data/wikipedia.json.gz\\\"]}', '{\\\"type\\\": 
\\\"json\\\"}', '[{\\\"name\\\": \\\"added\\\", \\\"type\\\": \\\"long\\\"}, 
{\\\"name\\\": \\\"channel\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"cityName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"comment\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"commentLength\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": 
\\\"countryIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"countryName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"deleted\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"delta\\\", 
\\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"deltaBucket\\\", \\\"type\\\": 
\\\"string\\\"}, {\\\"name\\\": \\\"diffUrl\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"flags\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"isAnonymous\\\", \\\"type\\\": \\\"string\\\"}, {\\\
 "name\\\": \\\"isMinor\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"isNew\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isRobot\\\", 
\\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isUnpatrolled\\\", 
\\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"metroCode\\\", \\\"type\\\": 
\\\"string\\\"}, {\\\"name\\\": \\\"namespace\\\", \\\"type\\\": 
\\\"string\\\"}, {\\\"name\\\": \\\"page\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"regionIsoCode\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"regionName\\\", \\\"type\\\": \\\"string\\\"}, 
{\\\"name\\\": \\\"timestamp\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": 
\\\"user\\\", \\\"type\\\": \\\"string\\\"}]')) \\\n",
+    "          PARTITIONED BY DAY\",\n",
+    "  \"context\": {\n",
+    "    \"maxNumTasks\": 3\n",
+    "  }\n",
+    "})\n",
+    "\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "ingestion_taskId_response = response\n",
+    "ingestion_taskId = 
json.loads(ingestion_taskId_response.text)['taskId']\n",
+    "print(f\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(f\"\\nInserting data into the table named {dataSourceName}\")\n",
+    "print(\"\\nThe response includes the task ID and the status: \" + 
response.text + \".\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ceb86ce0-85f6-4c63-8fd6-883033ee96e9",
+   "metadata": {},
+   "source": [
+    "Wait for ingestion to complete before proceeding.\n",
+    "To check on the status of your ingestion task, run the following cell. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "df12d12c-a067-4759-bae0-0410c24b6205",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import time\n",
+    "\n",
+    "endpoint = f\"/druid/indexer/v1/task/{ingestion_taskId}/status\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"GET\"\n",
+    "\n",
+    "payload = {}\n",
+    "headers = {}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "ingestion_status = json.loads(response.text)['status']['status']\n",
+    "# If you only want to fetch the status once and print it, \n",
+    "# uncomment the print statement and comment out the if and while loops\n",
+    "# print(json.dumps(response.json(), indent=4))\n",
+    "\n",
+    "if ingestion_status == \"RUNNING\":\n",
+    "  print(\"The ingestion is running...\")\n",
+    "\n",
+    "while ingestion_status != \"SUCCESS\":\n",
+    "  response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "  ingestion_status = json.loads(response.text)['status']['status']\n",
+    "  time.sleep(15)  \n",
+    "  \n",
+    "if ingestion_status == \"SUCCESS\": \n",
+    "  print(\"The ingestion is complete:\")\n",
+    "  print(json.dumps(response.json(), indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "240b0ad5-48f2-4737-b12b-5fd5f98da300",
+   "metadata": {},
+   "source": [
+    "## Datasources\n",
+    "\n",
+    "Druid supports a variety of datasources, with the table datasource being 
the most common. In Druid documentation, the word \"datasource\" often 
implicitly refers to the table datasource.\n",
+    "The 
[Datasources](https://druid.apache.org/docs/latest/querying/datasource.html) 
topic provides a comprehensive overview of datasources supported by Druid 
SQL.\n",
+    "\n",
+    "In Druid SQL, table datasources reside in the `druid` schema. This is the 
default schema, so table datasources can be referenced as either 
`druid.dataSourceName` or `dataSourceName`.\n",
+    "\n",
+    "For example, run the next cell to return the rows of the column named 
`channel` from the `wikipedia-sql-tutorial` table. Because this tutorial is 
running in Jupyter, make sure to limit the size of your query results for 
display purposes using the SQL LIMIT clause."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "91dd255a-4d55-493e-a067-4cef5c659657",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT \\\"channel\\\" FROM \\\"wikipedia-sql-tutorial\\\" 
LIMIT 7\"})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(f\"\\nEach JSON object in the response represents a row in the 
{dataSourceName} datasource.\") \n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cbeb5a63",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Data types\n",
+    "\n",
+    "Druid maps SQL data types onto native types at query runtime.\n",
+    "The following native types are supported for Druid columns:\n",
+    "\n",
+    "* STRING: UTF-8 encoded strings and string arrays\n",
+    "* LONG: 64-bit signed int\n",
+    "* FLOAT: 32-bit float\n",
+    "* DOUBLE: 64-bit float\n",
+    "* COMPLEX: represents non-standard data types, such as nested JSON, 
hyperUnique and approxHistogram aggregators, and DataSketches aggregators\n",
+    "\n",
+    "Druid exposes table and column metadata through 
[INFORMATION_SCHEMA](https://druid.apache.org/docs/latest/querying/sql-metadata-tables.html#information-schema)
 tables. Run the following query to retrieve metadata for the 
`wikipedia-sql-tutorial` datasource. In the response body, each JSON object 
correlates to a column in the table.\n",
+    "Check the objects' `DATA_TYPE` property for SQL data types. You should 
see TIMESTAMP, BIGINT, and VARCHAR SQL data types. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b9227d6c-1d8c-4169-b13b-a08625c4011f",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE 
\\\"TABLE_SCHEMA\\\" = 'druid' AND \\\"TABLE_NAME\\\" = 
'wikipedia-sql-tutorial' LIMIT 7\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c59ca797-dd91-442b-8d02-67b711b3fcc6",
+   "metadata": {},
+   "source": [
+    "Druid natively interprets VARCHAR as STRING and BIGINT and TIMESTAMP SQL 
data types as LONG. For reference on how SQL data types map onto Druid native 
types, see [Standard 
types](https://druid.apache.org/docs/latest/querying/sql-data-types.html#standard-types).\n",
+    "\n",
+    "### Timestamp values\n",
+    "\n",
+    "Druid stores timestamp values as the number of milliseconds since the 
Unix epoch.\n",
+    "Primary timestamps are stored in a column named `__time`.\n",
+    "If a dataset doesn't have a timestamp, Druid uses the default value of 
`1970-01-01 00:00:00`.\n",
+    "\n",
+    "Druid time functions perform best when used with the `__time` column.\n",
+    "By default, time functions use the UTC time zone.\n",
+    "For more information about timestamp handling, see [Date and time 
functions](https://druid.apache.org/docs/latest/querying/sql-scalar.html#date-and-time-functions).\n",
+    "\n",
+    "Run the following cell to see Druid SQL `TIME_IN_INTERVAL` scalar 
function at work. This query checks whether a timestamp is contained within a 
specified interval. The results are grouped by date."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f7e3d62a-1325-4992-8bcd-c0f1925704bc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT TIME_IN_INTERVAL(__time, 
'2016-06-27T12:00:00/2016-06-27T13:00:00'), COUNT(*) FROM 
\\\"wikipedia-sql-tutorial\\\" \\\n",
+    "  GROUP BY TIME_IN_INTERVAL(__time, 
'2016-06-27T12:00:00/2016-06-27T13:00:00')\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7cfdfae-ccba-49ba-a70f-63d0bd3527b2",
+   "metadata": {},
+   "source": [
+    "### NULL values\n",
+    "\n",
+    "Druid supports SQL compatible NULL handling, allowing string columns to 
distinguish empty strings from NULL and numeric columns to contain NULL rows. 
To store and query data in SQL compatible mode, explicitly set the 
`useDefaultValueForNull` property to `false` in 
`_common/common.runtime.properties`. See [Configuration 
reference](https://druid.apache.org/docs/latest/configuration/index.html) for 
common configuration properties.\n",
+    "\n",
+    "When `useDefaultValueForNull` is set to `true` (default behavior), Druid 
stores NULL values as `0` for numeric columns and as `''` for string columns."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29c24856",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## SELECT statement syntax\n",
+    "\n",
+    "Druid SQL supports SELECT statements with the following structure:\n",
+    "\n",
+    "``` mysql\n",
+    "[ EXPLAIN PLAN FOR ]\n",
+    "[ WITH tableName [ ( column1, column2, ... ) ] AS ( query ) ]\n",
+    "SELECT [ ALL | DISTINCT ] { * | exprs }\n",
+    "FROM { <table> | (<subquery>) | <o1> [ INNER | LEFT ] JOIN <o2> ON 
condition }\n",
+    "[ WHERE expr ]\n",
+    "[ GROUP BY [ exprs | GROUPING SETS ( (exprs), ... ) | ROLLUP (exprs) | 
CUBE (exprs) ] ]\n",
+    "[ HAVING expr ]\n",
+    "[ ORDER BY expr [ ASC | DESC ], expr [ ASC | DESC ], ... ]\n",
+    "[ LIMIT limit ]\n",
+    "[ OFFSET offset ]\n",
+    "[ UNION ALL <another query> ]\n",
+    "```\n",
+    "\n",
+    "As a general rule, use the LIMIT clause with `SELECT *` to limit the 
number of rows returned. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8cf212e5-fb3f-4206-acdd-46ef1da327ab",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "## WHERE clause\n",
+    "\n",
+    "Druid SQL uses the [SQL WHERE 
clause](https://druid.apache.org/docs/latest/querying/sql.html#where) of a 
SELECT statement to fetch data based on a particular condition.\n",
+    "\n",
+    "In most cases, specifying a time range when using the WHERE clause 
improves query performance.\n",
+    "This is because Druid partitions data by time and having a time range 
allows Druid to skip over unrelated data.\n",
+    "\n",
+    "Druid supports range filtering on columns that contain long millisecond 
values, with the boundaries specified as ISO 8601 time intervals. This is 
suitable for the `__time` column, long metric columns, and dimensions with 
values that can be parsed as long milliseconds.\n",
+    "    \n",
+    "For example, the following cell uses a comparison operator on the 
`__time` field to filter results from a certain time range. The cell limits the 
results to seven rows for display purposes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ffca2962-da31-4b9c-adbc-882e35386916",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT * FROM \\\"wikipedia-sql-tutorial\\\" WHERE __time 
>= TIMESTAMP '2015-09-12 23:33:55' LIMIT 7\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5160db26-7e8d-40f7-8588-b7eabfc08355",
+   "metadata": {},
+   "source": [
+    "### Comparison operators\n",
+    "\n",
+    "Druid SQL supports the following comparison operators. You can use these 
operators in conjunction with the WHERE clause to compare expressions.\n",
+    "\n",
+    "- equal to (=)\n",
+    "- greater than(>)\n",
+    "- less than (<)\n",
+    "- greater than or equal (>=)\n",
+    "- less than or equal (<=)\n",
+    "- not equal to( <>)\n",
+    "\n",
+    "For example, the next cell returns the first seven records that match the 
following criteria:\n",
+    "- `cityName` is not an empty string\n",
+    "- `countryIsoCode` value equals to `US`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9afa74b9-ef9f-4b36-a7bb-88e4498a48ef",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT * FROM \\\"wikipedia-sql-tutorial\\\" WHERE 
\\\"cityName\\\" <> '' AND \\\"countryIsoCode\\\" = 'US' LIMIT 7\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd24470d-25c2-4031-a711-8477d69c9e94",
+   "metadata": {
+    "deletable": true,
+    "editable": true,
+    "tags": []
+   },
+   "source": [
+    "### Logical operators\n",
+    "\n",
+    "Druid's handling of logical operators is comparable to SQL with a few 
exceptions. For example, if an IN list contains NULL, the IN operator matches 
NULL values. This behavior is different from the SQL IN operator, which does 
not match NULL values. For a complete list of logical SQL operators supported 
by Druid SQL, see [Logical 
operators](https://druid.apache.org/docs/latest/querying/sql-operators.html#logical-operators)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2baf21b9-74d1-4df6-862f-afbaeef1812b",
+   "metadata": {},
+   "source": [
+    "## GROUP BY clause\n",
+    "\n",
+    "Druid SQL uses the [SQL GROUP 
BY](https://druid.apache.org/docs/latest/querying/sql.html#group-by) clause to 
separate items into groups, where each group is composed of rows with identical 
values. \n",
+    "The GROUP BY clause is often used with [aggregation 
functions](https://druid.apache.org/docs/latest/querying/sql-aggregations.html),
 such as COUNT or SUM, to produce summary values for each group.\n",
+    "\n",
+    "For example, the following cell counts all of the entries separated by 
the field `channel`. The output is limited to seven rows and has two fields: 
`channel` and `counts`. For each unique value of `channel`, Druid aggregates 
all rows having that value, counts the number of entries in the group, and 
assigns the results to a field called `counts`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "228ae0e4-355e-4b4d-8253-fb2e46715559",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT channel, COUNT(*) AS counts FROM 
\\\"wikipedia-sql-tutorial\\\" GROUP BY channel LIMIT 7\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eab67db4-a0f3-4177-b5be-1fef355bf33f",
+   "metadata": {},
+   "source": [
+    "You can further define the groups by specifying multiple dimensions.\n",
+    "Druid SQL supports using numbers in GROUP BY and ORDER BY clauses to 
refer to column positions in the SELECT clause.\n",
+    "Similar to SQL, Druid SQL uses one-based indexing to reference elements 
in SQL statements.\n",
+    "\n",
+    "For example, the next cell aggregates entries grouped by fields 
`cityName` and `countryName`.\n",
+    "The output has three fields: `cityName`, `countryName`, and `counts`. For 
each unique combination of `cityName` and `countryName`, Druid aggregates all 
rows and averages the entries in the group.\n",
+    "The output is limited to seven rows for display purposes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ca4b3bae-2b02-4c90-98a5-cb7806b6e649",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT cityName, countryName, COUNT(*) AS counts FROM 
\\\"wikipedia-sql-tutorial\\\" GROUP BY 1, 2 LIMIT 7\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, 
headers=headers, data=payload)\n",
+    "\n",
+    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), 
indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c2e56af-3fdf-40f1-869b-822ac8aafbc8",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "## Query types\n",
+    "\n",
+    "Druid uses the [Apache Calcite](https://calcite.apache.org/) data 
management framework to translate SQL statements into Druid’s [native 
JSON-based 
queries](https://druid.apache.org/docs/latest/querying/querying.html) on the 
Broker. The queries are then executed by the Druid cluster.\n",

Review Comment:
   Why do I need to know about Calcite? How does it affect my behavior as a 
developer? Why does the Broker matter to me in this moment. I'd consider a 
revision to this to section to maybe say that Druid is optimized for certain 
query types (Scan, etc). Then you can say that Druid translates the SQL queries 
to native (using Calcite). If folks need to know about the broker, we can 
mention that. @2bethere  WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] techdocsmith commented on a diff in pull request #13465: docs: notebook only for SQL tutorial

Reply via email to