Re: [PR] Jupyter nested columns tutorial (druid)

via GitHub Fri, 11 Aug 2023 03:40:33 -0700


writer-jill commented on code in PR #14526:
URL: https://github.com/apache/druid/pull/14526#discussion_r1291185088



##########
examples/quickstart/jupyter-notebooks/notebooks/02-ingestion/02-working-with-nested-columns.ipynb:
##########
@@ -0,0 +1,434 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Working with nested columns\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "This tutorial demonstrates how to work with [nested 
columns](https://druid.apache.org/docs/latest/querying/nested-columns.html) in 
Apache Druid.\n",
+    "\n",
+    "Druid stores nested data structures in `COMPLEX<json>` columns. In this 
tutorial you perform the following tasks:\n",
+    "\n",
+    "- Ingest nested JSON data using SQL-based ingestion.\n",
+    "- Transform nested data during ingestion using SQL JSON functions.\n",
+    "- Perform queries to display, filter, and aggregate nested data.\n",
+    "- Use helper operators to examine nested data and plan your queries.\n",
+    "\n",
+    "Druid supports directly ingesting nested data with the following formats: 
JSON, Parquet, Avro, ORC, Protobuf."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Table of contents\n",
+    "\n",
+    "- [Prerequisites](#Prerequisites)\n",
+    "- [Initialization](#Initialization)\n",
+    "- [Ingest nested data](#Ingest-nested-data)\n",
+    "- [Transform nested data](#Transform-nested-data)\n",
+    "- [Query nested data](#Query-nested-data)\n",
+    "- [Group, filter, and aggregate nested 
data](#Group-filter-and-aggregate-nested-data)\n",
+    "- [Use helper operators](#Use-helper-operators)\n",
+    "- [Learn more](#Learn-more)\n",
+    "\n",
+    "For the best experience, use JupyterLab so that you can always access the 
table of contents."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid 25.0.0 or later.\n",
+    "\n",
+    "Launch this tutorial using the `druid-jupyter` profile of the Docker 
Compose file for Jupyter-based Druid tutorials. For more information, see 
[Docker for Jupyter Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "\n",
+    "### Run without Docker Compose\n",
+    "\n",
+    "To run this notebook without Docker Compose you need 
[druidapi](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/druidapi/README.md),
 a Python client for Apache Druid."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Initialization"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Run the following cell to initialize the environment for the tutorial. 
The quickstart deployment configures Druid to listen on port `8888` by default, 
so you'll make API calls against `http://localhost:8888`.";
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import druidapi\n",
+    "import os\n",
+    "\n",
+    "if 'DRUID_HOST' not in os.environ.keys():\n",
+    "    druid_host=f\"http://localhost:8888\"\n";,
+    "else:\n",
+    "    druid_host=f\"http://{os.environ['DRUID_HOST']}:8888\"\n",
+    "    \n",
+    "print(f\"Opening a connection to {druid_host}.\")\n",
+    "druid = druidapi.jupyter_client(druid_host)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Run the following cell to define the two datasources the tutorial uses, 
create a SQL client to run SQL, and create a `druidapi` display client to 
format results."

Review Comment:
   I'll update this in current PR: https://github.com/apache/druid/pull/14788



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Jupyter nested columns tutorial (druid)

Reply via email to