Re: [PR] Jupyter nested columns tutorial (druid)

via GitHub Fri, 11 Aug 2023 04:34:36 -0700


writer-jill commented on code in PR #14788:
URL: https://github.com/apache/druid/pull/14788#discussion_r1291193972



##########
examples/quickstart/jupyter-notebooks/notebooks/02-ingestion/02-working-with-nested-columns.ipynb:
##########
@@ -0,0 +1,443 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Working with nested columns\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "This tutorial demonstrates how to work with [nested 
columns](https://druid.apache.org/docs/latest/querying/nested-columns.html) in 
Apache Druid.\n",
+    "\n",
+    "Druid stores nested data structures in `COMPLEX<json>` columns. In this 
tutorial you perform the following tasks:\n",
+    "\n",
+    "- Ingest nested JSON data using SQL-based ingestion.\n",
+    "- Transform nested data during ingestion using SQL JSON functions.\n",
+    "- Perform queries to display, filter, and aggregate nested data.\n",
+    "- Use helper operators to examine nested data and plan your queries.\n",
+    "\n",
+    "Druid supports directly ingesting nested data with the following formats: 
JSON, Parquet, Avro, ORC, Protobuf."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Table of contents\n",
+    "\n",
+    "- [Prerequisites](#Prerequisites)\n",
+    "- [Initialization](#Initialization)\n",
+    "- [Ingest nested data](#Ingest-nested-data)\n",
+    "- [Transform nested data](#Transform-nested-data)\n",
+    "- [Query nested data](#Query-nested-data)\n",
+    "- [Group, filter, and aggregate nested 
data](#Group-filter-and-aggregate-nested-data)\n",
+    "- [Use helper operators](#Use-helper-operators)\n",
+    "- [Learn more](#Learn-more)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid 25.0.0 or later.\n",
+    "\n",
+    "### Run with Docker\n",
+    "\n",
+    "Launch this tutorial and all prerequisites using the `druid-jupyter` 
profile of the Docker Compose file for Jupyter-based Druid tutorials. For more 
information, see [Docker for Jupyter Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "\n",
+    "### Run without Docker\n",
+    "\n",
+    "If you do not use the Docker Compose environment, you need the 
following:\n",
+    "\n",
+    "* A running Apache Druid instance, with a `DRUID_HOST` local environment 
variable containing the server name of your Druid router.\n",
+    "* 
[druidapi](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/druidapi/README.md),
 a Python client for Apache Druid. Follow the instructions in the Install 
section of the README file."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Initialization"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Run the next cell to set up the Druid Python client's connection to 
Apache Druid.\n",
+    "\n",
+    "If successful, the Druid version number will be shown in the output."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import druidapi\n",
+    "import os\n",
+    "\n",
+    "if 'DRUID_HOST' not in os.environ.keys():\n",
+    "    druid_host=f\"http://localhost:8888\"\n";,
+    "else:\n",
+    "    druid_host=f\"http://{os.environ['DRUID_HOST']}:8888\"\n",
+    "    \n",
+    "print(f\"Opening a connection to {druid_host}.\")\n",
+    "druid = druidapi.jupyter_client(druid_host)\n",
+    "\n",
+    "display = druid.display\n",
+    "sql_client = druid.sql\n",
+    "status_client = druid.status\n",
+    "\n",
+    "status_client.version"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Run the following cell to define the two datasources the tutorial uses."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "datasource1 = \"kttm\"\n",
+    "datasource2 = \"kttm_transform\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Ingest nested data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Run the following cell to ingest sample clickstream data from the [Koalas 
to the Max](https://www.koalastothemax.com/) game."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "INSERT INTO \"kttm\"\n",

Review Comment:
   This is a good idea but there's a known problem with dashes in the name. 
I've gone with `example_koalas_nesteddata`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Jupyter nested columns tutorial (druid)

Reply via email to