Re: [PR] 202306-notebook-ingestion of flight data as events (druid)

via GitHub Thu, 06 Jul 2023 02:51:55 -0700


petermarshallio commented on code in PR #14501:
URL: https://github.com/apache/druid/pull/14501#discussion_r1254207606



##########
examples/quickstart/jupyter-notebooks/notebooks/02-ingestion/XX-example-flightdata-events.ipynb:
##########
@@ -0,0 +1,807 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "e79d7d48-b403-4b9e-8cc6-0f0accecac1f",
+   "metadata": {},
+   "source": [
+    "# Data modeling and ingestion principles - creating Events from Druid's 
sample flight data\n",
+    "\n",
+    "Druid's data loader allows you to quickly ingest sample carrier data into 
a `TABLE`, giving you an easy way to learn about the SQL functions that are 
available. It's also a great place to start understanding how data modeling for 
event analytics in a real-time database differs from modeling you'd apply in 
other databases, as well as being small enough to safely see - and try out - 
different data layout designs safely.\n",
+    "\n",
+    "In this notebook, you'll walk through creating a table of events out of 
the sample data set, applying data modeling principles as you go. At the end 
you'll have a `TABLE` called \"flight-events\" that you can then use as you 
continue your learning in Apache Druid.\n",
+    "\n",
+    "## Prerequisites\n",
+    "\n",
+    "In order to use this notebook, you'll need access to a small Druid 
deployment.\n",
+    "\n",
+    "It's a good idea to test ingesting the data \"as is\" on that cluster to 
make sure it's operational before you get going.\n",
+    "\n",
+    "## Getting started\n",
+    "\n",
+    "Run the following to set up the druid api. Remember to change the 
`druid-host` to the appropriate endpoint to submit your SQL.\n",
+    "\n",
+    "**NOTE** that this notebook calls the `sql_client.wait_until_ready` 
method. This will pause the Python kernel until ingestion has completed, and 
subsequent cells will not run until the ingestion is finished."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ffc13d62-d1fc-45bc-855a-8c7687d4c720",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import druidapi\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In the Docker Compose tutorial environment, this is the Router\n",
+    "# service running at \"http://router:8888\".\n";,
+    "\n",
+    "# If you are not using the Docker Compose environment, edit the 
`druid_host`.\n",
+    "\n",
+    "druid_host = \"http://router:8888\"\n";,
+    "druid_host\n",
+    "\n",
+    "druid = druidapi.jupyter_client(druid_host)\n",
+    "display = druid.display\n",
+    "sql_client = druid.sql"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "596eed1f-c47f-48cc-a537-5703c7eefc38",
+   "metadata": {},
+   "source": [
+    "## Apply modeling principles\n",
+    "\n",
+    "### Principle 1 - create the right `TABLE` for the right query\n",
+    "\n",
+    "#### Finding the Events\n",
+    "\n",
+    "Let's take a look at the data we have. Using the Druid Console you can 
preview the data you want to load.\n",
+    "\n",
+    "1. Open the console\n",

Review Comment:
   Shifted this into an admonition, and instead refer straight to the dataset 
definition. As is, the ingestion takes too long to run (attention killer) for 
me to pull this into the Druid service in order to show a sample of the rows.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] 202306-notebook-ingestion of flight data as events (druid)

Reply via email to