Re: [PR] Jupyter nested columns tutorial (druid)

via GitHub Thu, 10 Aug 2023 07:59:02 -0700


techdocsmith commented on code in PR #14526:
URL: https://github.com/apache/druid/pull/14526#discussion_r1290245210



##########
examples/quickstart/jupyter-notebooks/notebooks/02-ingestion/02-working-with-nested-columns.ipynb:
##########
@@ -0,0 +1,434 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Working with nested columns\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "This tutorial demonstrates how to work with [nested 
columns](https://druid.apache.org/docs/latest/querying/nested-columns.html) in 
Apache Druid.\n",
+    "\n",
+    "Druid stores nested data structures in `COMPLEX<json>` columns. In this 
tutorial you perform the following tasks:\n",

Review Comment:
   I like this intro. I wonder if there is a way to make a suggestion for this 
type of intro in the template



##########
examples/quickstart/jupyter-notebooks/notebooks/02-ingestion/02-working-with-nested-columns.ipynb:
##########
@@ -0,0 +1,434 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Working with nested columns\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "This tutorial demonstrates how to work with [nested 
columns](https://druid.apache.org/docs/latest/querying/nested-columns.html) in 
Apache Druid.\n",
+    "\n",
+    "Druid stores nested data structures in `COMPLEX<json>` columns. In this 
tutorial you perform the following tasks:\n",
+    "\n",
+    "- Ingest nested JSON data using SQL-based ingestion.\n",
+    "- Transform nested data during ingestion using SQL JSON functions.\n",
+    "- Perform queries to display, filter, and aggregate nested data.\n",
+    "- Use helper operators to examine nested data and plan your queries.\n",
+    "\n",
+    "Druid supports directly ingesting nested data with the following formats: 
JSON, Parquet, Avro, ORC, Protobuf."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Table of contents\n",
+    "\n",
+    "- [Prerequisites](#Prerequisites)\n",
+    "- [Initialization](#Initialization)\n",
+    "- [Ingest nested data](#Ingest-nested-data)\n",
+    "- [Transform nested data](#Transform-nested-data)\n",
+    "- [Query nested data](#Query-nested-data)\n",
+    "- [Group, filter, and aggregate nested 
data](#Group-filter-and-aggregate-nested-data)\n",
+    "- [Use helper operators](#Use-helper-operators)\n",
+    "- [Learn more](#Learn-more)\n",
+    "\n",
+    "For the best experience, use JupyterLab so that you can always access the 
table of contents."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid 25.0.0 or later.\n",
+    "\n",
+    "Launch this tutorial using the `druid-jupyter` profile of the Docker 
Compose file for Jupyter-based Druid tutorials. For more information, see 
[Docker for Jupyter Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "\n",
+    "### Run without Docker Compose\n",
+    "\n",
+    "To run this notebook without Docker Compose you need 
[druidapi](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/druidapi/README.md),
 a Python client for Apache Druid."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Initialization"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Run the following cell to initialize the environment for the tutorial. 
The quickstart deployment configures Druid to listen on port `8888` by default, 
so you'll make API calls against `http://localhost:8888`.";
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import druidapi\n",
+    "import os\n",
+    "\n",
+    "if 'DRUID_HOST' not in os.environ.keys():\n",
+    "    druid_host=f\"http://localhost:8888\"\n";,
+    "else:\n",
+    "    druid_host=f\"http://{os.environ['DRUID_HOST']}:8888\"\n",
+    "    \n",
+    "print(f\"Opening a connection to {druid_host}.\")\n",
+    "druid = druidapi.jupyter_client(druid_host)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Run the following cell to define the two datasources the tutorial uses, 
create a SQL client to run SQL, and create a `druidapi` display client to 
format results."

Review Comment:
   Nit: Do we ever use the variables for the datasources? For example in the 
INSERT statement, we're just explicitly naming the datasource:
   ```
   INSERT INTO \"kttm\"\n",
   ```
   I'm almost for just removing the variables right now and always just putting 
in the datasource where we need to. You have to submit the sql a different in 
druidapi in order to pass the datasource as a parameter.



##########
examples/quickstart/jupyter-notebooks/notebooks/02-ingestion/02-working-with-nested-columns.ipynb:
##########
@@ -0,0 +1,434 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Working with nested columns\n",

Review Comment:
   What about "Ingesting and querying data in nested columns"? or something 
similar?



##########
examples/quickstart/jupyter-notebooks/notebooks/02-ingestion/02-working-with-nested-columns.ipynb:
##########
@@ -0,0 +1,434 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Working with nested columns\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "This tutorial demonstrates how to work with [nested 
columns](https://druid.apache.org/docs/latest/querying/nested-columns.html) in 
Apache Druid.\n",
+    "\n",
+    "Druid stores nested data structures in `COMPLEX<json>` columns. In this 
tutorial you perform the following tasks:\n",
+    "\n",
+    "- Ingest nested JSON data using SQL-based ingestion.\n",
+    "- Transform nested data during ingestion using SQL JSON functions.\n",
+    "- Perform queries to display, filter, and aggregate nested data.\n",
+    "- Use helper operators to examine nested data and plan your queries.\n",
+    "\n",
+    "Druid supports directly ingesting nested data with the following formats: 
JSON, Parquet, Avro, ORC, Protobuf."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Table of contents\n",

Review Comment:
   I like the TOC. Can we add this to the template?



##########
examples/quickstart/jupyter-notebooks/notebooks/02-ingestion/02-working-with-nested-columns.ipynb:
##########
@@ -0,0 +1,434 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Working with nested columns\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n";,
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "This tutorial demonstrates how to work with [nested 
columns](https://druid.apache.org/docs/latest/querying/nested-columns.html) in 
Apache Druid.\n",
+    "\n",
+    "Druid stores nested data structures in `COMPLEX<json>` columns. In this 
tutorial you perform the following tasks:\n",
+    "\n",
+    "- Ingest nested JSON data using SQL-based ingestion.\n",
+    "- Transform nested data during ingestion using SQL JSON functions.\n",
+    "- Perform queries to display, filter, and aggregate nested data.\n",
+    "- Use helper operators to examine nested data and plan your queries.\n",
+    "\n",
+    "Druid supports directly ingesting nested data with the following formats: 
JSON, Parquet, Avro, ORC, Protobuf."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Table of contents\n",
+    "\n",
+    "- [Prerequisites](#Prerequisites)\n",
+    "- [Initialization](#Initialization)\n",
+    "- [Ingest nested data](#Ingest-nested-data)\n",
+    "- [Transform nested data](#Transform-nested-data)\n",
+    "- [Query nested data](#Query-nested-data)\n",
+    "- [Group, filter, and aggregate nested 
data](#Group-filter-and-aggregate-nested-data)\n",
+    "- [Use helper operators](#Use-helper-operators)\n",
+    "- [Learn more](#Learn-more)\n",
+    "\n",
+    "For the best experience, use JupyterLab so that you can always access the 
table of contents."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid 25.0.0 or later.\n",
+    "\n",
+    "Launch this tutorial using the `druid-jupyter` profile of the Docker 
Compose file for Jupyter-based Druid tutorials. For more information, see 
[Docker for Jupyter Notebook 
tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "\n",
+    "### Run without Docker Compose\n",
+    "\n",
+    "To run this notebook without Docker Compose you need 
[druidapi](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/druidapi/README.md),
 a Python client for Apache Druid."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Initialization"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Run the following cell to initialize the environment for the tutorial. 
The quickstart deployment configures Druid to listen on port `8888` by default, 
so you'll make API calls against `http://localhost:8888`.";

Review Comment:
   This isn't the case and the template doesn't explain it.
   
   ```suggestion
       "Run the following cell to initialize the environment for the tutorial. 
Docker Compose automatically sets the DRUID_HOST for you based upon your 
profile. If DRUID_HOST isn't found, the script connects to druid on 
http://localhost:8888. "
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Jupyter nested columns tutorial (druid)

Reply via email to