[GitHub] [druid] techdocsmith commented on a diff in pull request #13787: Python Druid API for use in notebooks

via GitHub Thu, 02 Mar 2023 13:36:04 -0800


techdocsmith commented on code in PR #13787:
URL: https://github.com/apache/druid/pull/13787#discussion_r1123739766



##########
examples/quickstart/jupyter-notebooks/api-tutorial.ipynb:
##########
@@ -458,11 +665,16 @@
     "- [Druid SQL 
API](https://druid.apache.org/docs/latest/querying/sql-api.html)\n",
     "- [API 
reference](https://druid.apache.org/docs/latest/operations/api-reference.html)\n",
     "\n",
-    "You can also try out the 
[druid-client](https://github.com/paul-rogers/druid-client), a Python library 
for Druid created by Paul Rogers, a Druid contributor.\n",
-    "\n",
-    "\n",
-    "\n"
+    "You can also try out the 
[druid-client](https://github.com/paul-rogers/druid-client), a Python library 
for Druid created by Paul Rogers, a Druid contributor. A simplified version of 
that library is included with these tutorials. See [the Python API 
Tutorial](Python_API_Tutorial.ipynb) for an overview. That tutorial shows how 
to do the same tasks as this one, but in a simpler form: focusing on the Druid 
actions and not the mechanics of the REST API."
    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "386a05e5",
+   "metadata": {},
+   "outputs": [],
+   "source": []

Review Comment:
   empty code block



##########
examples/quickstart/jupyter-notebooks/-START HERE-.ipynb:
##########
@@ -0,0 +1,164 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "e415d732",
+   "metadata": {},
+   "source": [
+    "# Jupyter Notebook tutorials for Druid\n",
+    "\n",
+    "<!-- This README and the tutorial-jupyter-index.md file in docs/tutorials 
share a lot of the same content.\n",
+    "If you make a change in one place, update the other too. -->\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "You can try out the Druid APIs using the Jupyter Notebook-based 
tutorials. These\n",
+    "tutorials provide snippets of Python code that you can use to run calls 
against\n",
+    "the Druid API to complete the tutorial."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "60015702",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "To get this far, you've installed Python 3 and Jupyter Notebook. Make 
sure you meet the following requirements before starting the Jupyter-based 
tutorials:\n",
+    "\n",
+    "- The `requests` package for Python. For example, you can install it with 
the following command:\n",
+    "\n",
+    "   ```bash\n",
+    "   pip3 install requests\n",
+    "   ````\n",
+    "\n",
+    "- JupyterLab (recommended) or Jupyter Notebook running on a non-default 
port. By default, Druid\n",
+    "  and Jupyter both try to use port `8888`, so start Jupyter on a 
different port.\n",
+    "\n",
+    "- An available Druid instance. You can use the local quickstart 
configuration\n",
+    "  described in 
[Quickstart](https://druid.apache.org/docs/latest/tutorials/index.html).\n",
+    "  The tutorials assume that you are using the quickstart, so no 
authentication or authorization\n",
+    "  is expected unless explicitly mentioned.\n",
+    "\n",
+    "## Simple Druid API\n",
+    "\n",
+    "One of the notebooks shows how to use the Druid REST API. The others 
focus on other\n",
+    "topics and use a simple set of Python wrappers around the underlying REST 
API. The\n",
+    "wrappers reside in the `druidapi` package within this directory. While 
the package\n",
+    "can be used in any Python program, the key purpose, at present, is to 
support these\n",
+    "notebooks. See the [Introduction to the Druid Python 
API](Python_API_Tutorial.ipynb)\n",
+    "for an overview of the Python API."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d9e18342",
+   "metadata": {},
+   "source": [
+    "## Tutorials\n",
+    "\n",
+    "The notebooks are located in the [apache/druid repo](\n",
+    
"https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).\n",
+    "You can either clone the repo or download the notebooks you want 
individually.\n",
+    "\n",
+    "The links that follow are the raw GitHub URLs, so you can use them to 
download the\n",
+    "notebook directly, such as with `wget`, or manually through your web 
browser. Note\n",
+    "that if you save the file from your web browser, make sure to remove the 
`.txt` extension.\n",
+    "\n",
+    "- [Introduction to the Druid REST API](api-tutorial.ipynb) walks you 
through some of the\n",
+    "  basics related to the Druid REST API and several endpoints.\n",
+    "- [Introduction to the Druid Python API](Python_API_Tutorial.ipynb) walks 
you through some of the\n",
+    "  basics related to the Druid API using the Python wrapper API.\n",
+    "- [Learn the basics of Druid SQL](sql-tutorial.ipynb) introduces you to 
the unique aspects of Druid SQL with the primary focus on the SELECT statement. 
"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1a4b986a",
+   "metadata": {},
+   "source": [
+    "## Contributing\n",
+    "\n",
+    "If you build a Jupyter tutorial, you need to do a few things to add it to 
the docs\n",
+    "in addition to saving the notebook in this directory. The process 
requires two PRs to the repo.\n",
+    "\n",
+    "For the first PR, do the following:\n",
+    "\n",
+    "1. Depending on the goal of the notebook, you may want to clear the 
outputs from your notebook\n",
+    "   before you make the PR. You can use the following command:\n",
+    "\n",
+    "   ```bash\n",
+    "   jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace 
./path/to/notebook/notebookName.ipynb\n",
+    "   ```\n",
+    "   \n",
+    "   This can also be done in Jupyter Notebook itself: `Kernel` &rarr; 
`Restart & Clear Output`\n",
+    "\n",
+    "2. Create the PR as you normally would. Make sure to note that this PR is 
the one that\n",
+    "   contains only the Jupyter notebook and that there will be a subsequent 
PR that updates\n",
+    "   related pages.\n",
+    "\n",
+    "3. After this first PR is merged, grab the \"raw\" URL for the file from 
GitHub. For example,\n",
+    "   navigate to the file in the GitHub web UI and select **Raw**. Use the 
URL for this in the\n",
+    "   second PR as the download link.\n",
+    "\n",
+    "For the second PR, do the following:\n",
+    "\n",
+    "1. Update the list of [Tutorials](#tutorials) on this page and in the\n",
+    "   [Jupyter tutorial index 
page](../../../docs/tutorials/tutorial-jupyter-index.md#tutorials)\n",
+    "   in the `docs/tutorials` directory.\n",
+    "\n",
+    "2. Update `tutorial-jupyter-index.md` and provide the URL to the raw 
version of the file\n",
+    "   that becomes available after the first PR is merged.\n",
+    "\n",
+    "Note that you can skip the second PR, if you just copy the prefix link 
from one of the\n",
+    "existing notebook links when doing your first PR."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5e6f2a0e",
+   "metadata": {},
+   "outputs": [],
+   "source": []

Review Comment:
   empty code block in notebook



##########
examples/quickstart/jupyter-notebooks/Python_API_Tutorial.ipynb:
##########
@@ -0,0 +1,751 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ce2efaaa",
+   "metadata": {},
+   "source": [
+    "# Learn the Druid Python API\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "This notebook provides a quick introduction to the Python wrapper around 
the [Druid REST API](api-tutorial.ipynb). This notebook assumes you are 
familiar with the basics of the REST API, and the [set of operations which 
Druid 
provides](https://druid.apache.org/docs/latest/operations/api-reference.html). 
This tutorial focuses on using Python to access those APIs rather than 
explaining the APIs themselves. The APIs themselves are covered in other 
notebooks that use the Python API.\n",
+    "\n",
+    "The Druid Python API is primarily intended to help with these notebook 
tutorials. It can also be used in your own ad-hoc notebooks, or in a regular 
Python program.\n",
+    "\n",
+    "The Druid Python API is a work in progress. The Druid team adds API 
wrappers as needed for the notebook tutorials. If you find you need additional 
wrappers, please feel free to add them, and post a PR to Apache Druid with your 
additions.\n",
+    "\n",
+    "The API provides two levels of functions. Most are simple wrappers around 
Druid's REST APIs. Others add additional code to make the API easier to use. 
The SQL query interface is a prime example: extra code translates a simple SQL 
query into Druid's `SQLQuery` object and interprets the results into a form 
that can be displayed in a notebook.\n",
+    "\n",
+    "This notebook contains sample output to allow it to work a bit like a 
reference. To run it yourself, start by using the `Kernel` &rarr; `Restart & 
Clear Output` menu command to clear the sample output.\n",
+    "\n",
+    "Start by importing the `druidapi` package from the same folder as this 
notebook."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6d90ca5d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import druidapi"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fb68a838",
+   "metadata": {},
+   "source": [
+    "Next, connect to your cluster by providing the router endpoint. The code 
assumes the cluster is on your local machine, using the default port. Go ahead 
and change this if your setup is different.\n",
+    "\n",
+    "The API uses the router to forward messages to each of Druid's services 
so that you don't have to keep track of the host and port for each service.\n",
+    "\n",
+    "The `jupyter_client()` method waits for the cluster to be ready, and sets 
up the client to display tables and messages as HTML. To use this code without 
waiting and without HTML formatting, use the `client()` method instead."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ae601081",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "druid = druidapi.jupyter_client('http://localhost:8888')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8b4e774b",
+   "metadata": {},
+   "source": [
+    "## Status Client\n",
+    "\n",
+    "The SDK groups Druid REST API calls into categories, with a client for 
each. Start with the status client."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ff16fc3b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "status_client = druid.status"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be992774",
+   "metadata": {},
+   "source": [
+    "Use the Python `help()` function to learn what methods are avaialble."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "03f26417",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "help(status_client)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e803c9fe",
+   "metadata": {},
+   "source": [
+    "Check the version of your cluster. Some of these notebooks illustrate 
newer features available only on specific versions of Druid."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2faa0d81",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "status_client.version"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d78a6c35",
+   "metadata": {},
+   "source": [
+    "You can also check which extensions are loaded in your cluster. Some 
notebooks require specific extensions to be available."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1001f412",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "status_client.properties['druid.extensions.loadList']"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "012b2e61",
+   "metadata": {},
+   "source": [
+    "## Display Client\n",
+    "\n",
+    "The display client performs Druid operations, then formats the results 
for display in a notebook. Running SQL queries in a notebook is easy with the 
display client.\n",
+    "\n",
+    "When run outside a notebook, the display client formats results as text. 
The display client is the most convenient way to work with Druid in a notebook. 
Most operations also have a form that returns results as Python objects rather 
than displaying them. Use these methods if you write code to work with the 
results. Here the goal is just to interact with Druid."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f867f1f0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display = druid.display"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d051bc5e",
+   "metadata": {},
+   "source": [
+    "Start by getting a list of schemas."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dd8387e0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display.schemas()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8261ab0",
+   "metadata": {},
+   "source": [
+    "Then, retreive the tables (or datasources) within any schema."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "64dcb46a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display.tables('INFORMATION_SCHEMA')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff311595",
+   "metadata": {},
+   "source": [
+    "The above shows the list of datasources by default. You'll get an empty 
result if you have no datasources yet."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "616770ce",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display.tables()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7392e484",
+   "metadata": {},
+   "source": [
+    "You can easily run a query and show the results:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2c649eef",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "SELECT TABLE_NAME\n",
+    "FROM INFORMATION_SCHEMA.TABLES\n",
+    "WHERE TABLE_SCHEMA = 'INFORMATION_SCHEMA'\n",
+    "'''\n",
+    "display.sql(sql)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c6c4e1d4",
+   "metadata": {},
+   "source": [
+    "The query above showed the same results as `tables()`. That is not 
surprising: `tables()` just runs this query for you."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f414d145",
+   "metadata": {},
+   "source": [
+    "## SQL Client\n",
+    "\n",
+    "While the display client is handy for simple queries, sometimes you need 
more control, or want to work with the data returned from a query. For this you 
use the SQL client."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9951e976",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql_client = druid.sql"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b944084",
+   "metadata": {},
+   "source": [
+    "The SQL client allows you create a SQL request object that enables 
passing context parameters and query parameters. Druid will work out the query 
parameter type based on the Python type. Use the display client to show the 
query results."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dd559827",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "SELECT TABLE_NAME\n",
+    "FROM INFORMATION_SCHEMA.TABLES\n",
+    "WHERE TABLE_SCHEMA = ?\n",
+    "'''\n",
+    "req = sql_client.sql_request(sql)\n",
+    "req.add_parameter('INFORMATION_SCHEMA')\n",
+    "req.add_context(\"someParameter\", \"someValue\")\n",
+    "display.sql(req)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "937dc6b1",
+   "metadata": {},
+   "source": [
+    "The request has other features for advanced use cases: see the code for 
details. The query API actually returns a sql response object. Use this if you 
want to get the values directly, work with the schema, etc."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fd7a1827",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "SELECT TABLE_NAME\n",
+    "FROM INFORMATION_SCHEMA.TABLES\n",
+    "WHERE TABLE_SCHEMA = 'INFORMATION_SCHEMA'\n",
+    "'''\n",
+    "resp = sql_client.sql_query(sql)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2fe6a749",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "col1 = resp.schema[0]\n",
+    "print(col1.name, col1.sql_type, col1.druid_type)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "41d27bb1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "resp.rows"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "481af1f2",
+   "metadata": {},
+   "source": [
+    "The `show()` method uses this information for format an HTML table to 
present the results."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8dba807b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "resp.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "99f8db7b",
+   "metadata": {},
+   "source": [
+    "The display and SQL clients are intened for exploratory queries. The 
[pydruid](https://pythonhosted.org/pydruid/) library provides a robust way to 
run native queries, to run SQL queries, and to convert the results to various 
formats."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e3be017",
+   "metadata": {},
+   "source": [
+    "## MSQ Ingestion\n",
+    "\n",
+    "The SQL client also performs MSQ-based ingestion using `INSERT` or 
`REPLACE` statements. Use the extension check above to ensure that 
`druid-multi-stage-query` is loaded in Druid 26. (Later versions may have MSQ 
built in.)\n",
+    "\n",
+    "An MSQ query is run using a different API: `task()`. This API returns a 
response object that describes the Overlord task which runs the MSQ query. For 
tutorials, data is usually small enough you can wait for the ingestion to 
complete. Do that with the `run_task()` call which handles the waiting. To 
illustrate, here is a query that ingests a subset of columns, and includes a 
few data clean-up steps:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "10f1e451",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "REPLACE INTO \"myWiki1\" OVERWRITE ALL\n",
+    "SELECT\n",
+    "  TIME_PARSE(\"timestamp\") AS \"__time\",\n",
+    "  namespace,\n",
+    "  page,\n",
+    "  channel,\n",
+    "  \"user\",\n",
+    "  countryName,\n",
+    "  CASE WHEN isRobot = 'true' THEN 1 ELSE 0 END AS isRobot,\n",
+    "  \"added\",\n",
+    "  \"delta\",\n",
+    "  CASE WHEN isNew = 'true' THEN 1 ELSE 0 END AS isNew,\n",
+    "  CAST(\"deltaBucket\" AS DOUBLE) AS deltaBucket,\n",
+    "  \"deleted\"\n",
+    "FROM TABLE(\n",
+    "  EXTERN(\n",
+    "    
'{\"type\":\"http\",\"uris\":[\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n",
+    "    '{\"type\":\"json\"}',\n",
+    "    
'[{\"name\":\"isRobot\",\"type\":\"string\"},{\"name\":\"channel\",\"type\":\"string\"},{\"name\":\"timestamp\",\"type\":\"string\"},{\"name\":\"flags\",\"type\":\"string\"},{\"name\":\"isUnpatrolled\",\"type\":\"string\"},{\"name\":\"page\",\"type\":\"string\"},{\"name\":\"diffUrl\",\"type\":\"string\"},{\"name\":\"added\",\"type\":\"long\"},{\"name\":\"comment\",\"type\":\"string\"},{\"name\":\"commentLength\",\"type\":\"long\"},{\"name\":\"isNew\",\"type\":\"string\"},{\"name\":\"isMinor\",\"type\":\"string\"},{\"name\":\"delta\",\"type\":\"long\"},{\"name\":\"isAnonymous\",\"type\":\"string\"},{\"name\":\"user\",\"type\":\"string\"},{\"name\":\"deltaBucket\",\"type\":\"long\"},{\"name\":\"deleted\",\"type\":\"long\"},{\"name\":\"namespace\",\"type\":\"string\"},{\"name\":\"cityName\",\"type\":\"string\"},{\"name\":\"countryName\",\"type\":\"string\"},{\"name\":\"regionIsoCode\",\"type\":\"string\"},{\"name\":\"metroCode\",\"type\":\"long\"},{\"name\":\"countryIsoCode\",
 \"type\":\"string\"},{\"name\":\"regionName\",\"type\":\"string\"}]'\n",
+    "  )\n",
+    ")\n",
+    "PARTITIONED BY DAY\n",
+    "CLUSTERED BY namespace, page\n",
+    "'''"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d752b1d4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql_client.run_task(sql)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef4512f8",
+   "metadata": {},
+   "source": [
+    "MSQ reports task completion as soon as ingestion is done. However, it 
takes a while for Druid to load the resulting segments. Wait for the table to 
become ready."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "37fcedf2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql_client.wait_until_ready('myWiki1')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11d9c95a",
+   "metadata": {},
+   "source": [
+    "`describe_table()` lists the columns in a table."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b662697b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display.table('myWiki1')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "936f57fb",
+   "metadata": {},
+   "source": [
+    "You can sample a few rows of data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c4cfa5dc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display.sql('SELECT * FROM myWiki1 LIMIT 10')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1152f41",
+   "metadata": {},
+   "source": [
+    "## Datasource Client\n",
+    "\n",
+    "The Datasource client lets you perform operations on datasource objects. 
The SQL layer allows you to get metadata and do queries. The datasource client 
works with the underlying segments. Explaining the full functionality is the 
topic of another notebook. For now, you can use the datasource client to clean 
up the datasource created above. The `True` argument asks for \"if exists\" 
semantics so you don't get an error if the datasource was alredy deleted."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fba659ce",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ds_client = druid.datasources\n",
+    "ds_client.drop('myWiki', True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c96fdcc6",
+   "metadata": {},
+   "source": [
+    "## Tasks Client\n",
+    "\n",
+    "Use the tasks client to work with Overlord tasks. The `run_task()` call 
above actually uses the task client internally to poll Overlord."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b4f5ea17",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "task_client = druid.tasks\n",
+    "task_client.tasks()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1deaf95f",
+   "metadata": {},
+   "source": [
+    "## REST Client\n",
+    "\n",
+    "The Druid Python API starts with a REST client that itself is built on 
the `requests` package. The REST client implements the common patterns seen in 
the Druid REST API. You can create a client directly:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b1e55635",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from druidapi.rest import DruidRestClient\n",
+    "rest_client = DruidRestClient(\"http://localhost:8888\";)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dcb8055f",
+   "metadata": {},
+   "source": [
+    "Or, if you have already created the Druid client, you can reuse the 
existing REST client. This is how the various other clients work internally."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "370ba76a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "rest_client = druid.rest"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2654e72c",
+   "metadata": {},
+   "source": [
+    "Use the REST client if you need to make calls that are not yet wrapped by 
the Python API, or if you want to do something special. To illustrate the 
client, you can make some of the same calls as in the [Druid REST API 
notebook](api_tutorial.ipynb).\n",
+    "\n",
+    "The REST API maintains the Druid host: you just provide the specifc URL 
tail. There are methods to get or post JSON results. For example, to get status 
information:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9e42dfbc",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "rest_client.get_json('/status')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "837e08b0",
+   "metadata": {},
+   "source": [
+    "A quick comparison of the three approaches (Requests, REST client, Python 
client):\n",
+    "\n",
+    "Status:\n",
+    "\n",
+    "* Requests: `session.get(druid_host + '/status').json()`\n",
+    "* REST client: `rest_client.get_json('/status')`\n",
+    "* Status client: `status_client.status()`\n",
+    "\n",
+    "Health:\n",
+    "\n",
+    "* Requests: `session.get(druid_host + '/status/health').json()`\n",
+    "* REST client: `rest_client.get_json('/status/health')`\n",
+    "* Status client: `status_client.is_healthy()`\n",
+    "\n",
+    "Ingest data:\n",
+    "\n",
+    "* Requests: See the [REST tutorial](api_tutorial.ipynb)\n",
+    "* REST client: as the REST tutorial, but use 
`rest_client.post_json('/druid/v2/sql/task', sql_request)` and\n",
+    "  
`rest_client.get_json(f\"/druid/indexer/v1/task/{ingestion_taskId}/status\")`\n",
+    "* SQL client: `sql_client.run_task(sql)`, also a form for a full SQL 
request.\n",
+    "\n",
+    "List datasources:\n",
+    "\n",
+    "* Requests: `session.get(druid_host + 
'/druid/coordinator/v1/datasources').json()`\n",
+    "* REST client: 
`rest_client.get_json('/druid/coordinator/v1/datasources')`\n",
+    "* Datasources client: `ds_client.names()`\n",
+    "\n",
+    "Query data, where `sql_request` is a properly-formatted `SqlResquest` 
dictionary:\n",
+    "\n",
+    "* Requests: `session.post(druid_host + '/druid/v2/sql', 
json=sql_request).json()`\n",
+    "* REST client: `rest_client.post_json('/druid/v2/sql', sql_request)`\n",
+    "* SQL Client: `sql_client.show(sql)`, where `sql` is the query text\n",
+    "\n",
+    "In general, you have to provide the all the details for the Requests 
library. The REST client handles the low-level repetitious bits. The Python 
clients provide methods that encapsulate the specifics of the URLs and return 
formats."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "edc4ee39",
+   "metadata": {},
+   "source": [
+    "## Constants\n",
+    "\n",
+    "Druid has a large number of special constants: type names, options, etc. 
The `consts` module provides definitions for many of these:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a90187c6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from druidapi import consts"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fc535898",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "help(consts)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b661b29f",
+   "metadata": {},
+   "source": [
+    "Using the constants avoids typos:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3393af62",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql_client.show_tables(consts.SYS_SCHEMA)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e789ca7",
+   "metadata": {},
+   "source": [
+    "## Tracing\n",
+    "\n",
+    "It is often handy to see what the Druid API is doing: what messages it 
sends to Druid. You may need to debug some function that isn't working as 
expected. Or, perhaps you want to see what is sent to Druid so you can 
replicate it in your own code. Either way, just turn on tracing:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ac68b60e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "druid.trace(True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b9dc7e3",
+   "metadata": {},
+   "source": [
+    "Then, each call to Druid prints what it sends:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "72c955c0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql_client.show_tables()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ddaf0dc2",
+   "metadata": {},
+   "source": [
+    "## Conclusion\n",
+    "\n",
+    "This notebook have you a whirlwind tour of the Python Druid API: just 
enough to check your cluster, ingest some data with MSQ and query that data. 
Druid has many more APIs. As noted earlier, the Python API is a work in 
progress: the team adds new wrappers as needed for tutorials. Your 
[contributions](https://github.com/apache/druid/pulls) and 
[feedback](https://github.com/apache/druid/issues) are welcome."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0c9a9e4c",
+   "metadata": {},
+   "outputs": [],
+   "source": []

Review Comment:
   empty code block in notebook



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] techdocsmith commented on a diff in pull request #13787: Python Druid API for use in notebooks

Reply via email to