ektravel commented on code in PR #13465:
URL: https://github.com/apache/druid/pull/13465#discussion_r1051003730
##########
examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb:
##########
@@ -0,0 +1,764 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "ad4e60b6",
+ "metadata": {
+ "deletable": true,
+ "editable": true,
+ "tags": []
+ },
+ "source": [
+ "# Tutorial: Learn the basics of Druid SQL\n",
+ "\n",
+ "<!--\n",
+ " ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+ " ~ or more contributor license agreements. See the NOTICE file\n",
+ " ~ distributed with this work for additional information\n",
+ " ~ regarding copyright ownership. The ASF licenses this file\n",
+ " ~ to you under the Apache License, Version 2.0 (the\n",
+ " ~ \"License\"); you may not use this file except in compliance\n",
+ " ~ with the License. You may obtain a copy of the License at\n",
+ " ~\n",
+ " ~ http://www.apache.org/licenses/LICENSE-2.0\n",
+ " ~\n",
+ " ~ Unless required by applicable law or agreed to in writing,\n",
+ " ~ software distributed under the License is distributed on an\n",
+ " ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+ " ~ KIND, either express or implied. See the License for the\n",
+ " ~ specific language governing permissions and limitations\n",
+ " ~ under the License.\n",
+ " -->\n",
+ " \n",
+ "Apache Druid supports two query languages: Druid SQL and native
queries.\n",
+ "Druid SQL is a Structured Query Language (SQL) dialect that enables you
to query datasources in Apache Druid using SQL statements.\n",
+ "SQL and Druid SQL use similar syntax, with some notable differences.\n",
+ "Not all SQL functions are supported in Druid SQL. Instead, Druid includes
Druid-specific SQL functions for optimized query performance.\n",
+ "\n",
+ "This interactive tutorial introduces you to the unique aspects of Druid
SQL.\n",
+ "To learn about native queries, see [Native
queries](https://druid.apache.org/docs/latest/querying/querying.html)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8d6bbbcb",
+ "metadata": {
+ "deletable": true,
+ "tags": []
+ },
+ "source": [
+ "## Prerequisites\n",
+ "\n",
+ "Make sure that you meet the requirements outlined in the README.md file
of the [apache/druid
repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).\n",
+ "Specifically, you need the following:\n",
+ "- Knowledge of SQL\n",
+ "- [Python3](https://www.python.org/downloads/)\n",
+ "- [The `requests` package for
Python](https://requests.readthedocs.io/en/latest/user/install/)\n",
+ "- [JupyterLab](https://jupyter.org/install#jupyterlab) (recommended) or
[Jupyter Notebook](https://jupyter.org/install#jupyter-notebook) running on a
non-default port. Druid and Jupyter both default to port `8888`, so you need to
start Jupyter on a different port. \n",
+ "- An available Druid instance. This tutorial uses the `micro-quickstart`
configuration described in the [Druid
quickstart](https://druid.apache.org/docs/latest/tutorials/index.html), so no
authentication or authorization is required unless explicitly mentioned. If you
haven’t already, download Druid and start Druid services as described in the
quickstart."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8f8e64f0-c29a-473c-8783-a2ff8648acd7",
+ "metadata": {},
+ "source": [
+ "## Prepare your environment\n",
+ "\n",
+ "This section contains the steps required to prepare your environment to
follow along with this tutorial.\n",
+ "\n",
+ "Start by running the following cell. It imports the required Python
packages and defines a variable for the Druid host."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b7f08a52",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "import requests\n",
+ "import json\n",
+ "\n",
+ "# druid_host is the hostname and port for your Druid deployment.\n",
+ "# In a distributed environment, use the Router service as the
`druid_host`.\n",
+ "\n",
+ "druid_host = \"http://localhost:8888\"\n",
+ "dataSourceName = \"wikipedia-sql-tutorial\"\n",
+ "print(f\"\\033[1mDruid host\\033[0m: {druid_host}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e893ef7d-7136-442f-8bd9-31b5a5276518",
+ "metadata": {},
+ "source": [
+ "In the rest of the tutorial, the `endpoint`, `http_method`, and `payload`
variables are updated to accomplish different tasks.\n",
+ "\n",
+ "Run the following cell to ingest data from an external source into a
table named `wikipedia-sql-tutorial` using the [multi-stage query (MSQ) task
engine](https://druid.apache.org/docs/latest/multi-stage-query/index.html)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "045f782c-74d8-4447-9487-529071812b51",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "endpoint = \"/druid/v2/sql/task\"\n",
+ "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+ "http_method = \"POST\"\n",
+ "\n",
+ "# If you already have an existing datasource named
wikipedia-sql-tutorial, use REPLACE INTO instead of INSERT INTO.\n",
+ "payload = json.dumps({\n",
+ "\"query\": \"INSERT INTO \\\"wikipedia-sql-tutorial\\\" SELECT
TIME_PARSE(\\\"timestamp\\\") \\\n",
+ " AS __time, * FROM TABLE \\\n",
+ " (EXTERN('{\\\"type\\\": \\\"http\\\", \\\"uris\\\":
[\\\"https://druid.apache.org/data/wikipedia.json.gz\\\"]}', '{\\\"type\\\":
\\\"json\\\"}', '[{\\\"name\\\": \\\"added\\\", \\\"type\\\": \\\"long\\\"},
{\\\"name\\\": \\\"channel\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\":
\\\"cityName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\":
\\\"comment\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\":
\\\"commentLength\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\":
\\\"countryIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\":
\\\"countryName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\":
\\\"deleted\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"delta\\\",
\\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"deltaBucket\\\", \\\"type\\\":
\\\"string\\\"}, {\\\"name\\\": \\\"diffUrl\\\", \\\"type\\\": \\\"string\\\"},
{\\\"name\\\": \\\"flags\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\":
\\\"isAnonymous\\\", \\\"type\\\": \\\"string\\\"}, {\\\
"name\\\": \\\"isMinor\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\":
\\\"isNew\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isRobot\\\",
\\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isUnpatrolled\\\",
\\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"metroCode\\\", \\\"type\\\":
\\\"string\\\"}, {\\\"name\\\": \\\"namespace\\\", \\\"type\\\":
\\\"string\\\"}, {\\\"name\\\": \\\"page\\\", \\\"type\\\": \\\"string\\\"},
{\\\"name\\\": \\\"regionIsoCode\\\", \\\"type\\\": \\\"string\\\"},
{\\\"name\\\": \\\"regionName\\\", \\\"type\\\": \\\"string\\\"},
{\\\"name\\\": \\\"timestamp\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\":
\\\"user\\\", \\\"type\\\": \\\"string\\\"}]')) \\\n",
+ " PARTITIONED BY DAY\",\n",
+ " \"context\": {\n",
+ " \"maxNumTasks\": 3\n",
+ " }\n",
+ "})\n",
+ "\n",
+ "headers = {'Content-Type': 'application/json'}\n",
+ "\n",
+ "response = requests.request(http_method, druid_host+endpoint,
headers=headers, data=payload)\n",
+ "ingestiion_taskId_response = response\n",
Review Comment:
```suggestion
"ingestion_taskId_response = response\n",
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]