[GitHub] [druid] vtlim commented on a diff in pull request #13787: Python Druid API for use in notebooks

via GitHub Mon, 27 Feb 2023 12:56:10 -0800


vtlim commented on code in PR #13787:
URL: https://github.com/apache/druid/pull/13787#discussion_r1119290458



##########
docs/tutorials/tutorial-jupyter-index.md:
##########
@@ -22,51 +22,86 @@ title: "Jupyter Notebook tutorials"
   ~ under the License.
   -->
 
-<!-- tutorial-jupyter-index.md and 
examples/quickstart/juptyer-notebooks/README.md share a lot of the same 
content. If you make a change in one place, update the other too. -->
+<!-- tutorial-jupyter-index.md and 
examples/quickstart/juptyer-notebooks/README.md
+    share a lot of the same content. If you make a change in one place, update 
the other
+    too. -->
 
-You can try out the Druid APIs using the Jupyter Notebook-based tutorials. 
These tutorials provide snippets of Python code that you can use to run calls 
against the Druid API to complete the tutorial.
+You can try out the Druid APIs using the Jupyter Notebook-based tutorials. 
These
+tutorials provide snippets of Python code that you can use to run calls against
+the Druid API to complete the tutorial.
 
-## Prerequisites 
+## Prerequisites
 
 Make sure you meet the following requirements before starting the 
Jupyter-based tutorials:
 
-- Python 3 
+- Python 3.7 or later
+
+- The `requests` package for Python. For example, you can install it with the 
following command:
 
-- The `requests` package for Python. For example, you can install it with the 
following command: 
-   
    ```bash
    pip3 install requests
    ```
 
-- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. 
By default, Druid and Jupyter both try to use port `8888,` so start Jupyter on 
a different port.
+- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. 
By default, Druid
+  and Jupyter both try to use port `8888`, so start Jupyter on a different 
port.
 
   - Install JupyterLab or Notebook:
-  
-     ```bash
-     # Install JupyterLab
-     pip3 install jupyterlab  
-     # Install Jupyter Notebook
-     pip3 install notebook
-     ```
-  - Start JupyterLab
-  
+
+    ```bash
+    # Install JupyterLab
+    pip3 install jupyterlab
+    # Install Jupyter Notebook
+    pip3 install notebook
+    ```
+  - Start Jupyter using either JupyterLab
+    ```bash
+    # Start JupyterLab on port 3001
+    jupyter lab --port 3001
+    ```
+
+    Or using Jupyter Notebook
     ```bash
-     # Start JupyterLab on port 3001
-     jupyter lab --port 3001
-     ```
-   - Alternatively, start Jupyter Notebook
-     ```bash
-     # Start Jupyter Notebook on port 3001
-     jupyter notebook --port 3001
-     ```
+    # Start Jupyter Notebook on port 3001
+    jupyter notebook --port 3001
+    ```
+
+- An available Druid instance. You can use the [Quickstart 
(local)](./index.md) instance. The tutorials
+  assume that you are using the quickstart, so no authentication or 
authorization
+  is expected unless explicitly mentioned.
+
+  If you contribute to Druid, and work with Druid integration tests, can use a 
test cluster.
+  Assume you have an environment variable, `DRUID_DEV`, which identifies your 
Druid source repo.
+
+  ```bash
+  cd $DRUID_DEV
+  ./it.sh build
+  ./it.sh image
+  ./it.sh up <category>
+  ```
+
+  Replace `<catagory>` with one of the available integration test categories. 
See the integration

Review Comment:
   ```suggestion
     Replace `<category>` with one of the available integration test 
categories. See the integration
   ```



##########
examples/quickstart/jupyter-notebooks/Python_API_Tutorial.ipynb:
##########
@@ -0,0 +1,751 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ce2efaaa",
+   "metadata": {},
+   "source": [
+    "# Learn the Druid Python API\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "This notebook provides a quick introduction to the Python wrapper around 
the [Druid REST API](api-tutorial.ipynb). This notebook assumes you are 
familiar with the basics of the REST API, and the [set of operations which 
Druid 
provides](https://druid.apache.org/docs/latest/operations/api-reference.html). 
This tutorial focuses on using Python to access those APIs rather than 
explaining the APIs themselves. The APIs themselves are covered in other 
notebooks that use the Python API.\n",
+    "\n",
+    "The Druid Python API is primarily intended to help with these notebook 
tutorials. It can also be used in your own ad-hoc notebooks, or in a regular 
Python program.\n",
+    "\n",
+    "The Druid Python API is a work in progress. The Druid team adds API 
wrappers as needed for the notebook tutorials. If you find you need additional 
wrappers, please feel free to add them, and post a PR to Apache Druid with your 
additions.\n",
+    "\n",
+    "The API provides two levels of functions. Most are simple wrappers around 
Druid's REST APIs. Others add additional code to make the API easier to use. 
The SQL query interface is a prime example: extra code translates a simple SQL 
query into Druid's `SQLQuery` object and interprets the results into a form 
that can be displayed in a notebook.\n",
+    "\n",
+    "This notebook contains sample output to allow it to work a bit like a 
reference. To run it yourself, start by using the `Kernel` &rarr; `Restart & 
Clear Output` menu command to clear the sample output.\n",
+    "\n",
+    "Start by importing the `druidapi` package from the same folder as this 
notebook."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6d90ca5d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import druidapi"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fb68a838",
+   "metadata": {},
+   "source": [
+    "Next, connect to your cluster by providing the router endpoint. The code 
assumes the cluster is on your local machine, using the default port. Go ahead 
and change this if your setup is different.\n",
+    "\n",
+    "The API uses the router to forward messages to each of Druid's services 
so that you don't have to keep track of the host and port for each service.\n",
+    "\n",
+    "The `jupyter_client()` method waits for the cluster to be ready, and sets 
up the client to display tables and messages as HTML. To use this code without 
waiting and without HTML formatting, use the `client()` method instead."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ae601081",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "druid = druidapi.jupyter_client('http://localhost:8888')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8b4e774b",
+   "metadata": {},
+   "source": [
+    "## Status Client\n",
+    "\n",
+    "The SDK groups Druid REST API calls into categories, with a client for 
each. Start with the status client."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ff16fc3b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "status_client = druid.status"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be992774",
+   "metadata": {},
+   "source": [
+    "Use the Python `help()` function to learn what methods are avaialble."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "03f26417",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "help(status_client)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e803c9fe",
+   "metadata": {},
+   "source": [
+    "Check the version of your cluster. Some of these notebooks illustrate 
newer features available only on specific versions of Druid."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2faa0d81",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "status_client.version"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d78a6c35",
+   "metadata": {},
+   "source": [
+    "You can also check which extensions are loaded in your cluster. Some 
notebooks require specific extensions to be available."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1001f412",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "status_client.properties['druid.extensions.loadList']"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "769c363b",
+   "metadata": {},
+   "source": [
+    "## Display Client\n",
+    "\n",
+    "The display client performs Druid operations, then formats the results 
for display in a notebook. Running SQL queries in a notebook is easy with the 
display client.\n",
+    "\n",
+    "When run outside a notebook, the display client formats results as text. 
The display client is the most convenient way to work with Druid in a notebook. 
Most operations also have a form that returns results as Python objects rather 
than displaying them. Use these methods if you write code to work with the 
results. Here the goal is just to interact with Druid."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ccb6e119",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display = druid.display"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d051bc5e",
+   "metadata": {},
+   "source": [
+    "Start by getting a list of schemas."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dd8387e0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display.schemas()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8261ab0",
+   "metadata": {},
+   "source": [
+    "Then, retreive the tables (or datasources) within any schema."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "64dcb46a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display.tables('INFORMATION_SCHEMA')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff311595",
+   "metadata": {},
+   "source": [
+    "The above shows the list of datasources by default. You'll get an empty 
result if you have no datasources yet."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "616770ce",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display.tables()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7392e484",
+   "metadata": {},
+   "source": [
+    "You can easily run a query and show the results:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2c649eef",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "SELECT TABLE_NAME\n",
+    "FROM INFORMATION_SCHEMA.TABLES\n",
+    "WHERE TABLE_SCHEMA = 'INFORMATION_SCHEMA'\n",
+    "'''\n",
+    "display.sql(sql)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c6c4e1d4",
+   "metadata": {},
+   "source": [
+    "The query above showed the same results as `tables()`. That is not 
surprising: `tables()` just runs this query for you."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "59bee6e9",
+   "metadata": {},
+   "source": [
+    "## SQL Client\n",
+    "\n",
+    "While the display client is handy for simple queries, sometimes you need 
more control, or want to work with the data returned from a query. For this you 
use the SQL client."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "25310f40",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql_client = druid.sql"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b944084",
+   "metadata": {},
+   "source": [
+    "The SQL client allows you create a SQL request object that enables 
passing context parameters and query parameters. Druid will work out the query 
parameter type based on the Python type. Use the display client to show the 
query results."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dd559827",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "SELECT TABLE_NAME\n",
+    "FROM INFORMATION_SCHEMA.TABLES\n",
+    "WHERE TABLE_SCHEMA = ?\n",
+    "'''\n",
+    "req = sql_client.sql_request(sql)\n",
+    "req.add_parameter('INFORMATION_SCHEMA')\n",
+    "req.add_context(\"someParameter\", \"someValue\")\n",
+    "display.sql(req)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "937dc6b1",
+   "metadata": {},
+   "source": [
+    "The request has other features for advanced use cases: see the code for 
details. The query API actually returns a sql response object. Use this if you 
want to get the values directly, work with the schema, etc."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fd7a1827",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "SELECT TABLE_NAME\n",
+    "FROM INFORMATION_SCHEMA.TABLES\n",
+    "WHERE TABLE_SCHEMA = 'INFORMATION_SCHEMA'\n",
+    "'''\n",
+    "resp = sql_client.sql_query(sql)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2fe6a749",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "col1 = resp.schema[0]\n",
+    "print(col1.name, col1.sql_type, col1.druid_type)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "41d27bb1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "resp.rows"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "481af1f2",
+   "metadata": {},
+   "source": [
+    "The `show()` method uses this information for format an HTML table to 
present the results."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1dbf12a8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "resp.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c7c109ae",
+   "metadata": {},
+   "source": [
+    "The display and SQL clients are intened for exploratory queries. The 
[pydruid](https://pythonhosted.org/pydruid/) library provides a robust way to 
run native queries, to run SQL queries, and to convert the results to various 
formats."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e3be017",
+   "metadata": {},
+   "source": [
+    "## MSQ Ingestion\n",
+    "\n",
+    "The SQL client also performs MSQ-based ingestion using `INSERT` or 
`REPLACE` statements. Use the extension check above to ensure that 
`druid-multi-stage-query` is loaded in Druid 26. (Later versions may have MSQ 
built in.)\n",
+    "\n",
+    "An MSQ query is run using a different API: `task()`. This API returns a 
response object that describes the Overlord task which runs the MSQ query. For 
tutorials, data is usually small enough you can wait for the ingestion to 
complete. Do that with the `run_task()` call which handles the waiting. To 
illustrate, here is a query that ingests a subset of columns, and includes a 
few data clean-up steps:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "10f1e451",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "REPLACE INTO \"myWiki1\" OVERWRITE ALL\n",
+    "SELECT\n",
+    "  TIME_PARSE(\"timestamp\") AS \"__time\",\n",
+    "  namespace,\n",
+    "  page,\n",
+    "  channel,\n",
+    "  \"user\",\n",
+    "  countryName,\n",
+    "  CASE WHEN isRobot = 'true' THEN 1 ELSE 0 END AS isRobot,\n",
+    "  \"added\",\n",
+    "  \"delta\",\n",
+    "  CASE WHEN isNew = 'true' THEN 1 ELSE 0 END AS isNew,\n",
+    "  CAST(\"deltaBucket\" AS DOUBLE) AS deltaBucket,\n",
+    "  \"deleted\"\n",
+    "FROM TABLE(\n",
+    "  EXTERN(\n",
+    "    
'{\"type\":\"http\",\"uris\":[\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n",
+    "    '{\"type\":\"json\"}',\n",
+    "    
'[{\"name\":\"isRobot\",\"type\":\"string\"},{\"name\":\"channel\",\"type\":\"string\"},{\"name\":\"timestamp\",\"type\":\"string\"},{\"name\":\"flags\",\"type\":\"string\"},{\"name\":\"isUnpatrolled\",\"type\":\"string\"},{\"name\":\"page\",\"type\":\"string\"},{\"name\":\"diffUrl\",\"type\":\"string\"},{\"name\":\"added\",\"type\":\"long\"},{\"name\":\"comment\",\"type\":\"string\"},{\"name\":\"commentLength\",\"type\":\"long\"},{\"name\":\"isNew\",\"type\":\"string\"},{\"name\":\"isMinor\",\"type\":\"string\"},{\"name\":\"delta\",\"type\":\"long\"},{\"name\":\"isAnonymous\",\"type\":\"string\"},{\"name\":\"user\",\"type\":\"string\"},{\"name\":\"deltaBucket\",\"type\":\"long\"},{\"name\":\"deleted\",\"type\":\"long\"},{\"name\":\"namespace\",\"type\":\"string\"},{\"name\":\"cityName\",\"type\":\"string\"},{\"name\":\"countryName\",\"type\":\"string\"},{\"name\":\"regionIsoCode\",\"type\":\"string\"},{\"name\":\"metroCode\",\"type\":\"long\"},{\"name\":\"countryIsoCode\",
 \"type\":\"string\"},{\"name\":\"regionName\",\"type\":\"string\"}]'\n",
+    "  )\n",
+    ")\n",
+    "PARTITIONED BY DAY\n",
+    "CLUSTERED BY namespace, page\n",
+    "'''"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d752b1d4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql_client.run_task(sql)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef4512f8",
+   "metadata": {},
+   "source": [
+    "MSQ reports task completion as soon as ingestion is done. However, it 
takes a while for Druid to load the resulting segments. Wait for the table to 
become ready."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "37fcedf2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql_client.wait_until_ready('myWiki1')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11d9c95a",
+   "metadata": {},
+   "source": [
+    "`describe_table()` lists the columns in a table."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b662697b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display.table('myWiki1')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "936f57fb",
+   "metadata": {},
+   "source": [
+    "You can sample a few rows of data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c4cfa5dc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display.sql('SELECT * FROM myWiki1 LIMIT 10')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1152f41",
+   "metadata": {},
+   "source": [
+    "## Datasource Client\n",
+    "\n",
+    "The Datasource client lets you perform operations on datasource objects. 
The SQL layer allows you to get metadata and do queries. The datasource client 
works with the underlying segments. Explaining the full functionality is the 
topic of another notebook. For now, you can use the datasource client to clean 
up the datasource created above. The `True` argument asks for \"if exists\" 
semantics so you don't get an error if the datasource was alredy deleted."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fba659ce",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ds_client = druid.datasources\n",
+    "ds_client.drop('myWiki', True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c96fdcc6",
+   "metadata": {},
+   "source": [
+    "## Tasks Client\n",
+    "\n",
+    "Use the tasks client to work with Overlord tasks. The `run_task()` call 
above actually uses the task client internally to poll Overlord."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b4f5ea17",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "task_client = druid.tasks\n",
+    "task_client.tasks()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1deaf95f",
+   "metadata": {},
+   "source": [
+    "## REST Client\n",
+    "\n",
+    "The Druid Python API starts with a REST client that itself is built on 
the `requests` package. The REST client implements the common patterns seen in 
the Druid REST API. You can create a client directly:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b1e55635",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from druidapi.rest import DruidRestClient\n",
+    "rest_client = DruidRestClient(\"http://localhost:8888\";)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dcb8055f",
+   "metadata": {},
+   "source": [
+    "Or, if you have already created the Druid client, you can reuse the 
existing REST client. This is how the various other clients work internally."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "370ba76a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "rest_client = druid.rest"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2654e72c",
+   "metadata": {},
+   "source": [
+    "Use the REST client if you need to make calls that are not yet wrapped by 
the Python API, or if you want to do something special. To illustrate the 
client, you can make some of the same calls as in the [Druid REST API 
notebook](api_tutorial.ipynb).\n",
+    "\n",
+    "The REST API maintains the Druid host: you just provide the specifc URL 
tail. There are methods to get or post JSON results. For example, to get status 
information:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9e42dfbc",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "rest_client.get_json('/status')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "837e08b0",
+   "metadata": {},
+   "source": [
+    "A quick comparison of the three approaches (Requests, REST client, Python 
client):\n",
+    "\n",
+    "Status:\n",
+    "\n",
+    "* Requests: `session.get(druid_host + '/status').json()`\n",
+    "* REST client: `rest_client.get_json('/status')`\n",
+    "* Status client: `status_client.status()`\n",
+    "\n",
+    "Health:\n",
+    "\n",
+    "* Requests: `session.get(druid_host + '/status/health').json()`\n",
+    "* REST client: `rest_client.get_json('/status/health')`\n",
+    "* Status client: `status_client.is_healthy()`\n",
+    "\n",
+    "Ingest data:\n",
+    "\n",
+    "* Requests: See the [REST tutorial](api_tutorial.ipynb)\n",
+    "* REST client: as the REST tutorial, but use 
`rest_client.post_json('/druid/v2/sql/task', sql_request)` and\n",
+    "  
`rest_client.get_json(f\"/druid/indexer/v1/task/{ingestion_taskId}/status\")`\n",
+    "* SQL client: `sql_client.run_task(sql)`, also a form for a full SQL 
request.\n",
+    "\n",
+    "List datasources:\n",
+    "\n",
+    "* Requests: `session.get(druid_host + 
'/druid/coordinator/v1/datasources').json()`\n",
+    "* REST client: 
`rest_client.get_json('/druid/coordinator/v1/datasources')`\n",
+    "* Datasources client: `ds_client.names()`\n",
+    "\n",
+    "Query data, where `sql_request` is a properly-formatted `SqlResquest` 
dictionary:\n",

Review Comment:
   ```suggestion
       "Query data, where `sql_request` is a properly formatted `SqlRequest` 
dictionary:\n",
   ```



##########
examples/quickstart/jupyter-notebooks/druidapi/rest.py:
##########
@@ -0,0 +1,265 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import requests
+from .util import dict_get
+from urllib.parse import quote
+from .error import ClientError
+
+def check_error(response):
+    '''
+    Raises an HttpError from the requests library if the response code is 
neither
+    OK (200) nor Accepted (202).
+
+    Druid's REST API is inconsistent with how it resports errors. Some APIs 
return

Review Comment:
   ```suggestion
       Druid's REST API is inconsistent with how it reports errors. Some APIs 
return
   ```



##########
examples/quickstart/jupyter-notebooks/druidapi/datasource.py:
##########
@@ -0,0 +1,81 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import requests, time
+from .consts import COORD_BASE
+from .rest import check_error
+from .util import dict_get
+
+REQ_DATASOURCES = COORD_BASE + '/datasources'
+REQ_DATASOURCE = REQ_DATASOURCES + '/{}'
+
+# Segment load status
+REQ_DATASOURCES = COORD_BASE + '/datasources'
+REQ_DS_LOAD_STATUS = REQ_DATASOURCES + '/{}/loadstatus'
+
+class DatasourceClient:
+    '''
+    Client for datasource APIs. Prefer to use the SQL to query the

Review Comment:
   ```suggestion
       Client for datasource APIs. Prefer to use SQL to query the
   ```



##########
examples/quickstart/jupyter-notebooks/druidapi/rest.py:
##########
@@ -0,0 +1,265 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import requests
+from .util import dict_get
+from urllib.parse import quote
+from .error import ClientError
+
+def check_error(response):
+    '''
+    Raises an HttpError from the requests library if the response code is 
neither
+    OK (200) nor Accepted (202).
+
+    Druid's REST API is inconsistent with how it resports errors. Some APIs 
return
+    an error as a JSON object. Others return a text message. Still others 
return
+    nothing at all. With the JSON format, sometimes the error returns an
+    'errorMessage' field, other times only a generic 'error' field.
+
+    This method attempts to parse these variations. If the error response JSON
+    matches one of the known error formats, then raises a `ClientError` with 
the error
+    message. Otherise, raises a Requests library `HTTPError` for a generic 
error.
+    If the response includes a JSON payload, then the it is returned in the 
json field 
+    of the `HTTPError` object so that the client can perhaps decode it.
+    '''
+    code = response.status_code
+    if code == requests.codes.ok or code == requests.codes.accepted:
+        return
+    json = None
+    try:
+        json = response.json()
+    except Exception:
+        # If we can't get the JSON, raise a Requets error
+        response.raise_for_status()
+    
+    # Druid JSON payload. Try to make sense of the error
+    msg = dict_get(json, 'errorMessage')
+    if not msg:
+        msg = dict_get(json, 'error')
+    if msg:
+        # We have an explanation from Druid. Raise a Client exception
+        raise ClientError(msg)
+    
+    # Don't know what the Druid JSON is. Raise a Requetss exception, but
+    # add on the JSON in the hopes that the caller can make use of it.
+    try:
+        response.raise_for_status()
+    except Exception as e:
+        e.json = json
+        raise e
+
+def build_url(endpoint, req, args=None) -> str:
+    '''
+    Returns the full URL for a REST call given the relative request API and
+    optional parameters to fill placeholders within the request URL.
+
+    Parameters
+    ----------
+    endpoint: str
+        The base URL for the service.
+
+    req: str
+        Relative URL, with optional {} placeholders
+
+    args: list
+        Optional list of values to match {} placeholders in the URL.
+    '''
+    url = endpoint + req
+    if args:
+        quoted = [quote(arg) for arg in args]
+        url = url.format(*quoted)
+    return url
+
+class DruidRestClient:
+    '''
+    Wrapper around the basic Druid REST API operations using the
+    requests Python package. Handles the grunt work of building up
+    URLs, working with JSON, etc.
+
+    The REST client accepts an endpoint that represents a Druid service, 
typically
+    the Router. All requests are made to this service, which means using the 
service
+    URL as the base. That is, if the service is http://localhost:8888, then
+    a request for status is just '/status': the methods here build up the URL 
by
+    concatenating the service endpoint with the request URL.
+    '''
+
+    def __init__(self, endpoint):
+        self.endpoint = endpoint
+        self.trace = False
+        self.session = requests.Session()
+
+    def enable_trace(self, flag=True):
+        self.trace = flag
+
+    def build_url(self, req, args=None) -> str:
+        '''
+        Returns the full URL for a REST call given the relative request API and
+        optional parameters to fill placeholders within the request URL.
+
+        Parameters
+        ----------
+        req: str
+            Relative URL, with optional {} placeholders
+
+        args: list
+            Optional list of values to match {} placeholders in the URL.
+        '''
+        return build_url(self.endpoint, req, args)
+
+    def get(self, req, args=None, params=None, require_ok=True) -> 
requests.Request:
+        '''
+        Generic GET request to this service.
+
+        Parameters
+        ----------
+        req: str
+            The request URL without host, port or query string.
+            Example: `/status`
+
+        args: [str], default = None
+            Optional parameters to fill in to the URL.
+            Example: `/customer/{}`
+
+        params: dict, default = None
+            Optional map of query variables to send in
+            the URL. Query parameters are the name/values pairs
+            that appear after the `?` marker.
+
+        require_ok: bool, default = True
+            Whether to require an OK (200) response. If `True`, and
+            the request returns a different response code, then raises
+            a `RestError` exception.
+
+        Returns
+        -------
+        The `requests` `Request` object.
+        '''
+        url = self.build_url(req, args)
+        if self.trace:
+            print('GET:', url)
+        r = self.session.get(url, params=params)
+        if require_ok:
+            check_error(r)
+        return r
+
+    def get_json(self, url_tail, args=None, params=None):
+        '''
+        Generic GET request which expects a JSON response.
+        '''
+        r = self.get(url_tail, args, params)
+        return r.json()
+
+    def post(self, req, body, args=None, headers=None, require_ok=True) -> 
requests.Response:
+        '''
+        Issues a POST request for the given URL on this
+        node, with the given payload and optional URL query
+        parameters.
+        '''
+        url = self.build_url(req, args)
+        if self.trace:
+            print('POST:', url)
+            print('body:', body)
+        r = self.session.post(url, data=body, headers=headers)
+        if require_ok:
+            check_error(r)
+        return r
+
+    def post_json(self, req, body, args=None, headers=None, params=None) -> 
requests.Response:
+        '''
+        Issues a POST request for the given URL on this node, with a JSON 
request, returning
+        the JSON response.

Review Comment:
   ```suggestion
           Issues a POST request for the given URL on this node, with a JSON 
request payload.
           Returns the JSON response.
   ```



##########
examples/quickstart/jupyter-notebooks/Python_API_Tutorial.ipynb:
##########
@@ -0,0 +1,751 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ce2efaaa",
+   "metadata": {},
+   "source": [
+    "# Learn the Druid Python API\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "This notebook provides a quick introduction to the Python wrapper around 
the [Druid REST API](api-tutorial.ipynb). This notebook assumes you are 
familiar with the basics of the REST API, and the [set of operations which 
Druid 
provides](https://druid.apache.org/docs/latest/operations/api-reference.html). 
This tutorial focuses on using Python to access those APIs rather than 
explaining the APIs themselves. The APIs themselves are covered in other 
notebooks that use the Python API.\n",
+    "\n",
+    "The Druid Python API is primarily intended to help with these notebook 
tutorials. It can also be used in your own ad-hoc notebooks, or in a regular 
Python program.\n",
+    "\n",
+    "The Druid Python API is a work in progress. The Druid team adds API 
wrappers as needed for the notebook tutorials. If you find you need additional 
wrappers, please feel free to add them, and post a PR to Apache Druid with your 
additions.\n",
+    "\n",
+    "The API provides two levels of functions. Most are simple wrappers around 
Druid's REST APIs. Others add additional code to make the API easier to use. 
The SQL query interface is a prime example: extra code translates a simple SQL 
query into Druid's `SQLQuery` object and interprets the results into a form 
that can be displayed in a notebook.\n",
+    "\n",
+    "This notebook contains sample output to allow it to work a bit like a 
reference. To run it yourself, start by using the `Kernel` &rarr; `Restart & 
Clear Output` menu command to clear the sample output.\n",

Review Comment:
   ```suggestion
       "This notebook contains sample output to allow it to function as a 
reference. To run it yourself, start by using the `Kernel` &rarr; `Restart & 
Clear Output` menu command to clear the sample output.\n",
   ```



##########
examples/quickstart/jupyter-notebooks/Python_API_Tutorial.ipynb:
##########
@@ -0,0 +1,751 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ce2efaaa",
+   "metadata": {},
+   "source": [
+    "# Learn the Druid Python API\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "This notebook provides a quick introduction to the Python wrapper around 
the [Druid REST API](api-tutorial.ipynb). This notebook assumes you are 
familiar with the basics of the REST API, and the [set of operations which 
Druid 
provides](https://druid.apache.org/docs/latest/operations/api-reference.html). 
This tutorial focuses on using Python to access those APIs rather than 
explaining the APIs themselves. The APIs themselves are covered in other 
notebooks that use the Python API.\n",
+    "\n",
+    "The Druid Python API is primarily intended to help with these notebook 
tutorials. It can also be used in your own ad-hoc notebooks, or in a regular 
Python program.\n",
+    "\n",
+    "The Druid Python API is a work in progress. The Druid team adds API 
wrappers as needed for the notebook tutorials. If you find you need additional 
wrappers, please feel free to add them, and post a PR to Apache Druid with your 
additions.\n",
+    "\n",
+    "The API provides two levels of functions. Most are simple wrappers around 
Druid's REST APIs. Others add additional code to make the API easier to use. 
The SQL query interface is a prime example: extra code translates a simple SQL 
query into Druid's `SQLQuery` object and interprets the results into a form 
that can be displayed in a notebook.\n",
+    "\n",
+    "This notebook contains sample output to allow it to work a bit like a 
reference. To run it yourself, start by using the `Kernel` &rarr; `Restart & 
Clear Output` menu command to clear the sample output.\n",
+    "\n",
+    "Start by importing the `druidapi` package from the same folder as this 
notebook."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6d90ca5d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import druidapi"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fb68a838",
+   "metadata": {},
+   "source": [
+    "Next, connect to your cluster by providing the router endpoint. The code 
assumes the cluster is on your local machine, using the default port. Go ahead 
and change this if your setup is different.\n",
+    "\n",
+    "The API uses the router to forward messages to each of Druid's services 
so that you don't have to keep track of the host and port for each service.\n",
+    "\n",
+    "The `jupyter_client()` method waits for the cluster to be ready, and sets 
up the client to display tables and messages as HTML. To use this code without 
waiting and without HTML formatting, use the `client()` method instead."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ae601081",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "druid = druidapi.jupyter_client('http://localhost:8888')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8b4e774b",
+   "metadata": {},
+   "source": [
+    "## Status Client\n",
+    "\n",
+    "The SDK groups Druid REST API calls into categories, with a client for 
each. Start with the status client."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ff16fc3b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "status_client = druid.status"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be992774",
+   "metadata": {},
+   "source": [
+    "Use the Python `help()` function to learn what methods are avaialble."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "03f26417",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "help(status_client)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e803c9fe",
+   "metadata": {},
+   "source": [
+    "Check the version of your cluster. Some of these notebooks illustrate 
newer features available only on specific versions of Druid."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2faa0d81",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "status_client.version"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d78a6c35",
+   "metadata": {},
+   "source": [
+    "You can also check which extensions are loaded in your cluster. Some 
notebooks require specific extensions to be available."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1001f412",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "status_client.properties['druid.extensions.loadList']"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "769c363b",
+   "metadata": {},
+   "source": [
+    "## Display Client\n",
+    "\n",
+    "The display client performs Druid operations, then formats the results 
for display in a notebook. Running SQL queries in a notebook is easy with the 
display client.\n",
+    "\n",
+    "When run outside a notebook, the display client formats results as text. 
The display client is the most convenient way to work with Druid in a notebook. 
Most operations also have a form that returns results as Python objects rather 
than displaying them. Use these methods if you write code to work with the 
results. Here the goal is just to interact with Druid."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ccb6e119",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display = druid.display"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d051bc5e",
+   "metadata": {},
+   "source": [
+    "Start by getting a list of schemas."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dd8387e0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display.schemas()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8261ab0",
+   "metadata": {},
+   "source": [
+    "Then, retreive the tables (or datasources) within any schema."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "64dcb46a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display.tables('INFORMATION_SCHEMA')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff311595",
+   "metadata": {},
+   "source": [
+    "The above shows the list of datasources by default. You'll get an empty 
result if you have no datasources yet."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "616770ce",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display.tables()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7392e484",
+   "metadata": {},
+   "source": [
+    "You can easily run a query and show the results:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2c649eef",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "SELECT TABLE_NAME\n",
+    "FROM INFORMATION_SCHEMA.TABLES\n",
+    "WHERE TABLE_SCHEMA = 'INFORMATION_SCHEMA'\n",
+    "'''\n",
+    "display.sql(sql)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c6c4e1d4",
+   "metadata": {},
+   "source": [
+    "The query above showed the same results as `tables()`. That is not 
surprising: `tables()` just runs this query for you."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "59bee6e9",
+   "metadata": {},
+   "source": [
+    "## SQL Client\n",
+    "\n",
+    "While the display client is handy for simple queries, sometimes you need 
more control, or want to work with the data returned from a query. For this you 
use the SQL client."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "25310f40",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql_client = druid.sql"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b944084",
+   "metadata": {},
+   "source": [
+    "The SQL client allows you create a SQL request object that enables 
passing context parameters and query parameters. Druid will work out the query 
parameter type based on the Python type. Use the display client to show the 
query results."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dd559827",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "SELECT TABLE_NAME\n",
+    "FROM INFORMATION_SCHEMA.TABLES\n",
+    "WHERE TABLE_SCHEMA = ?\n",
+    "'''\n",
+    "req = sql_client.sql_request(sql)\n",
+    "req.add_parameter('INFORMATION_SCHEMA')\n",
+    "req.add_context(\"someParameter\", \"someValue\")\n",
+    "display.sql(req)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "937dc6b1",
+   "metadata": {},
+   "source": [
+    "The request has other features for advanced use cases: see the code for 
details. The query API actually returns a sql response object. Use this if you 
want to get the values directly, work with the schema, etc."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fd7a1827",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "SELECT TABLE_NAME\n",
+    "FROM INFORMATION_SCHEMA.TABLES\n",
+    "WHERE TABLE_SCHEMA = 'INFORMATION_SCHEMA'\n",
+    "'''\n",
+    "resp = sql_client.sql_query(sql)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2fe6a749",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "col1 = resp.schema[0]\n",
+    "print(col1.name, col1.sql_type, col1.druid_type)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "41d27bb1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "resp.rows"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "481af1f2",
+   "metadata": {},
+   "source": [
+    "The `show()` method uses this information for format an HTML table to 
present the results."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1dbf12a8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "resp.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c7c109ae",
+   "metadata": {},
+   "source": [
+    "The display and SQL clients are intened for exploratory queries. The 
[pydruid](https://pythonhosted.org/pydruid/) library provides a robust way to 
run native queries, to run SQL queries, and to convert the results to various 
formats."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e3be017",
+   "metadata": {},
+   "source": [
+    "## MSQ Ingestion\n",
+    "\n",
+    "The SQL client also performs MSQ-based ingestion using `INSERT` or 
`REPLACE` statements. Use the extension check above to ensure that 
`druid-multi-stage-query` is loaded in Druid 26. (Later versions may have MSQ 
built in.)\n",
+    "\n",
+    "An MSQ query is run using a different API: `task()`. This API returns a 
response object that describes the Overlord task which runs the MSQ query. For 
tutorials, data is usually small enough you can wait for the ingestion to 
complete. Do that with the `run_task()` call which handles the waiting. To 
illustrate, here is a query that ingests a subset of columns, and includes a 
few data clean-up steps:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "10f1e451",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "REPLACE INTO \"myWiki1\" OVERWRITE ALL\n",
+    "SELECT\n",
+    "  TIME_PARSE(\"timestamp\") AS \"__time\",\n",
+    "  namespace,\n",
+    "  page,\n",
+    "  channel,\n",
+    "  \"user\",\n",
+    "  countryName,\n",
+    "  CASE WHEN isRobot = 'true' THEN 1 ELSE 0 END AS isRobot,\n",
+    "  \"added\",\n",
+    "  \"delta\",\n",
+    "  CASE WHEN isNew = 'true' THEN 1 ELSE 0 END AS isNew,\n",
+    "  CAST(\"deltaBucket\" AS DOUBLE) AS deltaBucket,\n",
+    "  \"deleted\"\n",
+    "FROM TABLE(\n",
+    "  EXTERN(\n",
+    "    
'{\"type\":\"http\",\"uris\":[\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n",
+    "    '{\"type\":\"json\"}',\n",
+    "    
'[{\"name\":\"isRobot\",\"type\":\"string\"},{\"name\":\"channel\",\"type\":\"string\"},{\"name\":\"timestamp\",\"type\":\"string\"},{\"name\":\"flags\",\"type\":\"string\"},{\"name\":\"isUnpatrolled\",\"type\":\"string\"},{\"name\":\"page\",\"type\":\"string\"},{\"name\":\"diffUrl\",\"type\":\"string\"},{\"name\":\"added\",\"type\":\"long\"},{\"name\":\"comment\",\"type\":\"string\"},{\"name\":\"commentLength\",\"type\":\"long\"},{\"name\":\"isNew\",\"type\":\"string\"},{\"name\":\"isMinor\",\"type\":\"string\"},{\"name\":\"delta\",\"type\":\"long\"},{\"name\":\"isAnonymous\",\"type\":\"string\"},{\"name\":\"user\",\"type\":\"string\"},{\"name\":\"deltaBucket\",\"type\":\"long\"},{\"name\":\"deleted\",\"type\":\"long\"},{\"name\":\"namespace\",\"type\":\"string\"},{\"name\":\"cityName\",\"type\":\"string\"},{\"name\":\"countryName\",\"type\":\"string\"},{\"name\":\"regionIsoCode\",\"type\":\"string\"},{\"name\":\"metroCode\",\"type\":\"long\"},{\"name\":\"countryIsoCode\",
 \"type\":\"string\"},{\"name\":\"regionName\",\"type\":\"string\"}]'\n",
+    "  )\n",
+    ")\n",
+    "PARTITIONED BY DAY\n",
+    "CLUSTERED BY namespace, page\n",
+    "'''"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d752b1d4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql_client.run_task(sql)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef4512f8",
+   "metadata": {},
+   "source": [
+    "MSQ reports task completion as soon as ingestion is done. However, it 
takes a while for Druid to load the resulting segments. Wait for the table to 
become ready."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "37fcedf2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql_client.wait_until_ready('myWiki1')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11d9c95a",
+   "metadata": {},
+   "source": [
+    "`describe_table()` lists the columns in a table."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b662697b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display.table('myWiki1')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "936f57fb",
+   "metadata": {},
+   "source": [
+    "You can sample a few rows of data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c4cfa5dc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display.sql('SELECT * FROM myWiki1 LIMIT 10')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1152f41",
+   "metadata": {},
+   "source": [
+    "## Datasource Client\n",
+    "\n",
+    "The Datasource client lets you perform operations on datasource objects. 
The SQL layer allows you to get metadata and do queries. The datasource client 
works with the underlying segments. Explaining the full functionality is the 
topic of another notebook. For now, you can use the datasource client to clean 
up the datasource created above. The `True` argument asks for \"if exists\" 
semantics so you don't get an error if the datasource was alredy deleted."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fba659ce",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ds_client = druid.datasources\n",
+    "ds_client.drop('myWiki', True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c96fdcc6",
+   "metadata": {},
+   "source": [
+    "## Tasks Client\n",
+    "\n",
+    "Use the tasks client to work with Overlord tasks. The `run_task()` call 
above actually uses the task client internally to poll Overlord."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b4f5ea17",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "task_client = druid.tasks\n",
+    "task_client.tasks()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1deaf95f",
+   "metadata": {},
+   "source": [
+    "## REST Client\n",
+    "\n",
+    "The Druid Python API starts with a REST client that itself is built on 
the `requests` package. The REST client implements the common patterns seen in 
the Druid REST API. You can create a client directly:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b1e55635",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from druidapi.rest import DruidRestClient\n",
+    "rest_client = DruidRestClient(\"http://localhost:8888\";)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dcb8055f",
+   "metadata": {},
+   "source": [
+    "Or, if you have already created the Druid client, you can reuse the 
existing REST client. This is how the various other clients work internally."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "370ba76a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "rest_client = druid.rest"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2654e72c",
+   "metadata": {},
+   "source": [
+    "Use the REST client if you need to make calls that are not yet wrapped by 
the Python API, or if you want to do something special. To illustrate the 
client, you can make some of the same calls as in the [Druid REST API 
notebook](api_tutorial.ipynb).\n",
+    "\n",
+    "The REST API maintains the Druid host: you just provide the specifc URL 
tail. There are methods to get or post JSON results. For example, to get status 
information:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9e42dfbc",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "rest_client.get_json('/status')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "837e08b0",
+   "metadata": {},
+   "source": [
+    "A quick comparison of the three approaches (Requests, REST client, Python 
client):\n",
+    "\n",
+    "Status:\n",
+    "\n",
+    "* Requests: `session.get(druid_host + '/status').json()`\n",
+    "* REST client: `rest_client.get_json('/status')`\n",
+    "* Status client: `status_client.status()`\n",
+    "\n",
+    "Health:\n",
+    "\n",
+    "* Requests: `session.get(druid_host + '/status/health').json()`\n",
+    "* REST client: `rest_client.get_json('/status/health')`\n",
+    "* Status client: `status_client.is_healthy()`\n",
+    "\n",
+    "Ingest data:\n",
+    "\n",
+    "* Requests: See the [REST tutorial](api_tutorial.ipynb)\n",
+    "* REST client: as the REST tutorial, but use 
`rest_client.post_json('/druid/v2/sql/task', sql_request)` and\n",
+    "  
`rest_client.get_json(f\"/druid/indexer/v1/task/{ingestion_taskId}/status\")`\n",
+    "* SQL client: `sql_client.run_task(sql)`, also a form for a full SQL 
request.\n",
+    "\n",
+    "List datasources:\n",
+    "\n",
+    "* Requests: `session.get(druid_host + 
'/druid/coordinator/v1/datasources').json()`\n",
+    "* REST client: 
`rest_client.get_json('/druid/coordinator/v1/datasources')`\n",
+    "* Datasources client: `ds_client.names()`\n",
+    "\n",
+    "Query data, where `sql_request` is a properly-formatted `SqlResquest` 
dictionary:\n",
+    "\n",
+    "* Requests: `session.post(druid_host + '/druid/v2/sql', 
json=sql_request).json()`\n",
+    "* REST client: `rest_client.post_json('/druid/v2/sql', sql_request)`\n",
+    "* SQL Client: `sql_client.show(sql)`, where `sql` is the query text\n",
+    "\n",
+    "In general, you have to provide the all the details for the Requests 
library. The REST client handles the low-level repetitious bits. The Python 
clients provide methods that encapsulate the specifics of the URLS and return 
formats."

Review Comment:
   ```suggestion
       "In general, you have to provide the all the details for the Requests 
library. The REST client handles the low-level repetitious bits. The Python 
clients provide methods that encapsulate the specifics of the URLs and return 
formats."
   ```



##########
examples/quickstart/jupyter-notebooks/druidapi/rest.py:
##########
@@ -0,0 +1,265 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import requests
+from .util import dict_get
+from urllib.parse import quote
+from .error import ClientError
+
+def check_error(response):
+    '''
+    Raises an HttpError from the requests library if the response code is 
neither
+    OK (200) nor Accepted (202).
+
+    Druid's REST API is inconsistent with how it resports errors. Some APIs 
return
+    an error as a JSON object. Others return a text message. Still others 
return
+    nothing at all. With the JSON format, sometimes the error returns an
+    'errorMessage' field, other times only a generic 'error' field.
+
+    This method attempts to parse these variations. If the error response JSON
+    matches one of the known error formats, then raises a `ClientError` with 
the error
+    message. Otherise, raises a Requests library `HTTPError` for a generic 
error.
+    If the response includes a JSON payload, then the it is returned in the 
json field 
+    of the `HTTPError` object so that the client can perhaps decode it.
+    '''
+    code = response.status_code
+    if code == requests.codes.ok or code == requests.codes.accepted:
+        return
+    json = None
+    try:
+        json = response.json()
+    except Exception:
+        # If we can't get the JSON, raise a Requets error

Review Comment:
   ```suggestion
           # If we can't get the JSON, raise a Requests error
   ```



##########
examples/quickstart/jupyter-notebooks/druidapi/sql.py:
##########
@@ -134,21 +134,15 @@ def parse_rows(fmt, context, results):
     return rows[header_size:]
 
 def label_non_null_cols(results):
-    if results is None or len(results) == 0:
+    if not results:
         return []
     is_null = {}
     for key in results[0].keys():
         is_null[key] = True
     for row in results:
         for key, value in row.items():
-            if type(value) == str:
-                if value != '':
-                    is_null[key] = False
-            elif type(value) == float:
-                if value != 0.0:
-                    is_null[key] = False
-            elif value is not None:
-                is_null[key] = False
+            # The following is hack to check for null values, empty strings 
and numeric 0s.
+            is_null[key] = not not value

Review Comment:
   neat 😮 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] vtlim commented on a diff in pull request #13787: Python Druid API for use in notebooks

Reply via email to