[21/51] [partial] incubator-sdap-nexus git commit: SDAP-1 Import all code under the SDAP SGA

lewismc Fri, 27 Oct 2017 15:40:02 -0700

http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/esip-workshop/student-material/workshop2/1-
 Introduction.ipynb
----------------------------------------------------------------------
diff --git a/esip-workshop/student-material/workshop2/1- Introduction.ipynb 
b/esip-workshop/student-material/workshop2/1- Introduction.ipynb
new file mode 100644
index 0000000..483f34b
--- /dev/null
+++ b/esip-workshop/student-material/workshop2/1- Introduction.ipynb    
@@ -0,0 +1,62 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Introduction\n",
+    "\n",
+    "In this workshop you will learn how to deploy a NEXUS system using 
Docker.\n",
+    "\n",
+    "## EC2\n",
+    "\n",
+    "Each student or group will be assigned an EC2 instance to use for this 
workshop. The EC2 instance we will be using is:  \n",
+    "> r4.8xlarge  \n",
+    "> Memory: 244.0 GB  \n",
+    "> vCPUs: 32  \n",
+    "> EBS Storage: 500 GB gp2  \n",
+    "\n",
+    "## SSH\n",
+    "\n",
+    "You will be using SSH to connect to the Amazon EC2 instance assigned to 
you. You will need an SSH client on your laptop. All shell commands for this 
workshop will take place over the SSH connection.\n",
+    "\n",
+    "__NOTE__: Shell commands you are expected to run will be prefixed with a 
dollar sign `$`\n",
+    "\n",
+    "## Docker\n",
+    "\n",
+    "Docker is already installed on the EC2 instance. You will be asked to 
interact with the Docker command line client during this workshop.  \n",
+    "\n",
+    "`docker-compose` is used to coordinate the startup and stopping of the 
different components of the NEXUS system during this workshop.\n",
+    "\n",
+    "## System Architecture\n",
+    "\n",
+    "We are attempting to simulate a cluster deployment on a single machine. 
By the end of this workshop there will be 24 containers running on your EC2 
instance.\n",
+    "\n",
+    "![EC2 Containers](img/ec2-containers.png)\n",
+    "\n",
+    "In a production deployment, these containers would most likely be running 
on different machines and may be sized differently.\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}


http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/esip-workshop/student-material/workshop2/2
 - Infrastructure.ipynb
----------------------------------------------------------------------
diff --git a/esip-workshop/student-material/workshop2/2 - Infrastructure.ipynb 
b/esip-workshop/student-material/workshop2/2 - Infrastructure.ipynb
new file mode 100644
index 0000000..d74fa9e
--- /dev/null
+++ b/esip-workshop/student-material/workshop2/2 - Infrastructure.ipynb 
@@ -0,0 +1,160 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Starting the Infrastructure Cluster\n",
+    "\n",
+    "NEXUS relies on [Apache Solr](http://lucene.apache.org/solr/) to store 
metadata about tiles and [Apache Cassandra](http://cassandra.apache.org/) to 
store the floating point array data associated with those tiles. Both Solr and 
Cassandra are distributed storage systems and can be run in a cluster.  \n",
+    "\n",
+    "Solr requires [Apache Zookeeper](https://zookeeper.apache.org/) to run in 
cluster mode (called SolrCloud). This notebook walks through the process of 
bringing up a 3 node Cassandra cluster, 3 node Zookeeper cluster, and a 3 node 
SolrCloud.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 1: Start One Cassandra Container\n",
+    "\n",
+    "When initializing a Cassandra cluster, one or more nodes must be 
designated as a 'seed' node to help bootstrap the internal communication 
between nodes: [Internode communications 
(gossip)](http://docs.datastax.com/en/cassandra/2.1/cassandra/architecture/architectureGossipAbout_c.html).\n",
+    "\n",
+    "Therefore, the first step is to start one Cassandra container so that it 
can act as the seed node for the rest of our cluster.\n",
+    "\n",
+    "### TODO\n",
+    "1. Navigate to the directory containing the `docker-compose.yml` file for 
the infrastructure cluster\n",
+    "```bash\n",
+    "$ cd ~/nexus/esip-workshop/docker/infrastructure\n",
+    "```\n",
+    "\n",
+    "2. Use `docker-compose` to bring up the `cassandra1` container.\n",
+    "```bash\n",
+    "$ docker-compose up -d cassandra1\n",
+    "```\n",
+    "\n",
+    "3. Wait for the Cassandra node to become ready before continuing. Run the 
following command to follow the logs for `cassandra1`.\n",
+    "```bash\n",
+    "$ docker logs -f cassandra1\n",
+    "```\n",
+    "\n",
+    "4. Wait for the Cassandra node to start listening for clients. It should 
only take a minute or so. Look for this line in the logs:\n",
+    "> Starting listening for CQL clients on /0.0.0.0:9042\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 2: Start the Remaining Infrastructure Containers\n",
+    "\n",
+    "Once the first Cassandra node is running, the rest of the infrastructure 
cluster can be brought online. The remaining 8 containers in the infrastructure 
can be started using the `docker-compose` command again.\n",
+    "\n",
+    "### TODO\n",
+    "\n",
+    "1. Use `docker-compose` to bring up the remaining containers. __Note__: 
Make sure you are still in the same directory as Step 1 
`~/nexus/esip-workshop/docker/infrastructure`.\n",
+    "```bash\n",
+    "$ docker-compose up -d\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 3: Verify the Infrastructure has Started\n",
+    "\n",
+    "Now there should be 9 containers running that make up our 3 node 
Cassandra cluster, 3 node Zookeeper cluster, and 3 node SolrCloud. We can use a 
variety of commands to verify that our cluster is active and healthy.\n",
+    "\n",
+    "### TODO\n",
+    "\n",
+    "1. List all running docker containers.\n",
+    "```bash\n",
+    "$ docker ps\n",
+    "```\n",
+    "The output should look simillar to this:\n",
+    "<pre style=\"white-space: pre;\">\n",
+    "CONTAINER ID        IMAGE                 COMMAND                  
CREATED             STATUS              PORTS                                   
      NAMES  \n",
+    "90d370eb3a4e        nexusjpl/jupyter      \"tini -- start-not...\"   30 
hours ago        Up 30 hours         0.0.0.0:8000->8888/tcp                     
   jupyter  \n",
+    "cd0f47fe303d        nexusjpl/nexus-solr   \"docker-entrypoint...\"   30 
hours ago        Up 30 hours         8983/tcp                                   
   solr2  \n",
+    "8c0f5c8eeb45        nexusjpl/nexus-solr   \"docker-entrypoint...\"   30 
hours ago        Up 30 hours         8983/tcp                                   
   solr3  \n",
+    "27e34d14c16e        nexusjpl/nexus-solr   \"docker-entrypoint...\"   30 
hours ago        Up 30 hours         8983/tcp                                   
   solr1  \n",
+    "247f807cb5ec        cassandra:2.2.8       \"/docker-entrypoin...\"   30 
hours ago        Up 30 hours         7000-7001/tcp, 7199/tcp, 9042/tcp, 
9160/tcp   cassandra3  \n",
+    "09cc86a27321        zookeeper             \"/docker-entrypoin...\"   30 
hours ago        Up 30 hours         2181/tcp, 2888/tcp, 3888/tcp               
   zk1  \n",
+    "33e9d9b1b745        zookeeper             \"/docker-entrypoin...\"   30 
hours ago        Up 30 hours         2181/tcp, 2888/tcp, 3888/tcp               
   zk3  \n",
+    "dd29e4d09124        cassandra:2.2.8       \"/docker-entrypoin...\"   30 
hours ago        Up 30 hours         7000-7001/tcp, 7199/tcp, 9042/tcp, 
9160/tcp   cassandra2  \n",
+    "11e57e0c972f        zookeeper             \"/docker-entrypoin...\"   30 
hours ago        Up 30 hours         2181/tcp, 2888/tcp, 3888/tcp               
   zk2  \n",
+    "2292803d942d        cassandra:2.2.8       \"/docker-entrypoin...\"   30 
hours ago        Up 30 hours         7000-7001/tcp, 7199/tcp, 9042/tcp, 
9160/tcp   cassandra1  \n",
+    "</pre>\n",
+    "\n",
+    "2. Get the Cassandra cluster status by running `nodetool status` inside 
the `cassandra1` container.\n",
+    "```bash\n",
+    "$ docker exec cassandra1 nodetool status\n",
+    "```\n",
+    "You should see 3 cluster nodes:\n",
+    "<pre style=\"white-space: pre;\">\n",
+    "Datacenter: datacenter1\n",
+    "=======================\n",
+    "Status=Up/Down\n",
+    "|/ State=Normal/Leaving/Joining/Moving\n",
+    "--  Address     Load       Tokens       Owns (effective)  Host ID         
                      Rack\n",
+    "UN  172.18.0.2  4.8 GB     256          35.3%             
d9a0d273-b11c-41dd-9da1-cb77882f275f  rack1\n",
+    "UN  172.18.0.5  4.42 GB    256          33.2%             
d68d9ea7-04a0-4eaf-b9c6-333b606bd2b1  rack1\n",
+    "UN  172.18.0.7  4.16 GB    256          31.5%             
6f8683f9-abf8-4466-87bc-a5faa048956d  rack1\n",
+    "</pre>\n",
+    "\n",
+    "3. Get the status of the SolrCloud by running the cell below"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# TODO Run this cell to get the status of the Solr Cluster. You should 
see a collection called\n",
+    "# 'nexustiles' with 3 shards spread across all 3 nodes.\n",
+    "\n",
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "response = 
requests.get('http://solr1:8983/solr/admin/collections?action=clusterstatus&wt=json')\n",
+    "print(json.dumps(response.json(), indent=2))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Congratulations!\n",
+    "\n",
+    "You have sucessfully started up the NEXUS infrastructure. Your EC2 
instance now has 9 containers running:\n",
+    "\n",
+    "![Infrastructure](img/ec2-containers-infrastructure.png)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/esip-workshop/student-material/workshop2/3
 - Analysis.ipynb
----------------------------------------------------------------------
diff --git a/esip-workshop/student-material/workshop2/3 - Analysis.ipynb 
b/esip-workshop/student-material/workshop2/3 - Analysis.ipynb
new file mode 100644
index 0000000..468c9f7
--- /dev/null
+++ b/esip-workshop/student-material/workshop2/3 - Analysis.ipynb       
@@ -0,0 +1,218 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "# Starting the Analysis Cluster\n",
+    "\n",
+    "NEXUS utilizes [Apache Spark](https://spark.apache.org/) running on 
[Apache Mesos](http://mesos.apache.org/) for its analytical functions. Now that 
the infrastructure has been started, we can start up the analysis cluster.\n",
+    "\n",
+    "The analysis cluster consists of and Apache Mesos cluster and the NEXUS 
webapp [Tornado server](http://www.tornadoweb.org/en/stable/). The Mesos 
cluster we will be bringing up has one master node and three agent nodes. 
Apache Spark is already installed and configured on the three agent nodes and 
will act as Spark executors for the NEXUS analytic functions.\n",
+    "\n",
+    "## Step 1: Start the Containers\n",
+    "\n",
+    "We can use `docker-compose` again to start our containers.\n",
+    "\n",
+    "### TODO\n",
+    "\n",
+    "1. Navigate to the directory containing the docker-compose.yml file for 
the analysis cluster\n",
+    "```bash\n",
+    "$ cd ~/nexus/esip-workshop/docker/analysis\n",
+    "```\n",
+    "\n",
+    "2. Use docker-compose to bring up the containers in the analysis 
cluster\n",
+    "```bash\n",
+    "$ docker-compose up -d\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 2: Verify the Cluster is Working\n",
+    "\n",
+    "Now that the cluster has started we can use various commands to ensure 
that it is operational and monitor its status.\n",
+    "\n",
+    "### TODO\n",
+    "\n",
+    "1. List all running docker containers.\n",
+    "```bash\n",
+    "$ docker ps\n",
+    "```\n",
+    "The output should look simillar to this:\n",
+    "<pre style=\"white-space: pre;\">\n",
+    "CONTAINER ID        IMAGE                         COMMAND                 
 CREATED             STATUS              PORTS                                  
          NAMES\n",
+    "e5589456a78a        nexusjpl/nexus-webapp         
\"/tmp/docker-entry...\"   5 seconds ago       Up 5 seconds        
0.0.0.0:4040->4040/tcp, 0.0.0.0:8083->8083/tcp   nexus-webapp\n",
+    "18e682b9af0e        nexusjpl/spark-mesos-agent    
\"/tmp/docker-entry...\"   7 seconds ago       Up 5 seconds                     
                                    mesos-agent1\n",
+    "8951841d1da6        nexusjpl/spark-mesos-agent    
\"/tmp/docker-entry...\"   7 seconds ago       Up 6 seconds                     
                                    mesos-agent3\n",
+    "c0240926a4a2        nexusjpl/spark-mesos-agent    
\"/tmp/docker-entry...\"   7 seconds ago       Up 6 seconds                     
                                    mesos-agent2\n",
+    "c97ad268833f        nexusjpl/spark-mesos-master   \"/bin/bash -c 
'./b...\"   7 seconds ago       Up 7 seconds        0.0.0.0:5050->5050/tcp      
                     mesos-master\n",
+    "90d370eb3a4e        nexusjpl/jupyter              \"tini -- 
start-not...\"   2 days ago          Up 2 days           0.0.0.0:8000->8888/tcp 
                          jupyter\n",
+    "cd0f47fe303d        nexusjpl/nexus-solr           
\"docker-entrypoint...\"   2 days ago          Up 2 days           8983/tcp     
                                    solr2\n",
+    "8c0f5c8eeb45        nexusjpl/nexus-solr           
\"docker-entrypoint...\"   2 days ago          Up 2 days           8983/tcp     
                                    solr3\n",
+    "27e34d14c16e        nexusjpl/nexus-solr           
\"docker-entrypoint...\"   2 days ago          Up 2 days           8983/tcp     
                                    solr1\n",
+    "247f807cb5ec        cassandra:2.2.8               
\"/docker-entrypoin...\"   2 days ago          Up 2 days           
7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp      cassandra3\n",
+    "09cc86a27321        zookeeper                     
\"/docker-entrypoin...\"   2 days ago          Up 2 days           2181/tcp, 
2888/tcp, 3888/tcp                     zk1\n",
+    "33e9d9b1b745        zookeeper                     
\"/docker-entrypoin...\"   2 days ago          Up 2 days           2181/tcp, 
2888/tcp, 3888/tcp                     zk3\n",
+    "dd29e4d09124        cassandra:2.2.8               
\"/docker-entrypoin...\"   2 days ago          Up 2 days           
7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp      cassandra2\n",
+    "11e57e0c972f        zookeeper                     
\"/docker-entrypoin...\"   2 days ago          Up 2 days           2181/tcp, 
2888/tcp, 3888/tcp                     zk2\n",
+    "2292803d942d        cassandra:2.2.8               
\"/docker-entrypoin...\"   2 days ago          Up 2 days           
7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp      cassandra1\n",
+    "</pre>\n",
+    "\n",
+    "2. List the available Mesos slaves by running the cell below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# TODO Run this cell to see the status of the Mesos slaves. You should 
see 3 slaves connected.\n",
+    "\n",
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "response = requests.get('http://mesos-master:5050/state.json')\n",
+    "print(json.dumps(response.json()['slaves'], indent=2))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 3: List available Datasets\n",
+    "\n",
+    "Now that the cluster is up, we can investigate the datasets available. 
Use the `nexuscli` module to list available datatsets.\n",
+    "\n",
+    "### TODO \n",
+    "1. Get a list of datasets by using the `nexuscli` module to issue a 
request to the `nexus-webapp` container that was just started."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import nexuscli\n",
+    "\n",
+    "nexuscli.set_target(\"http://nexus-webapp:8083\";)\n",
+    "nexuscli.dataset_list()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 4: Run a Time Series\n",
+    "\n",
+    "Verify the analysis functions are working by running a simple Time 
Series.\n",
+    "\n",
+    "### TODO\n",
+    "\n",
+    "1. Run the cell below to produce a time series plot using the analysis 
cluster you just started."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TODO Run this cell to produce a Time Series plot using AVHRR data.\n",
+    "%matplotlib inline\n",
+    "import matplotlib.pyplot as plt\n",
+    "import time\n",
+    "import nexuscli\n",
+    "from datetime import datetime\n",
+    "\n",
+    "from shapely.geometry import box\n",
+    "\n",
+    "bbox = box(-150, 40, -120, 55)\n",
+    "datasets = [\"AVHRR_OI_L4_GHRSST_NCEI\"]\n",
+    "start_time = datetime(2013, 1, 1)\n",
+    "end_time = datetime(2013, 12, 31)\n",
+    "\n",
+    "start = time.perf_counter()\n",
+    "ts, = nexuscli.time_series(datasets, bbox, start_time, end_time, 
spark=True)\n",
+    "print(\"Time Series took {} seconds to 
generate\".format(time.perf_counter() - start))\n",
+    "\n",
+    "plt.figure(figsize=(10,5), dpi=100)\n",
+    "plt.plot(ts.time, ts.mean, 'b-', marker='|', markersize=2.0, mfc='b')\n",
+    "plt.grid(b=True, which='major', color='k', linestyle='-')\n",
+    "plt.xlabel(\"Time\")\n",
+    "plt.ylabel (\"Sea Surface Temperature (C)\")\n",
+    "plt.show()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "##  Step 5: Check the Results of the Spark Job\n",
+    "\n",
+    "The time series function in the previous cell will run on the Spark 
cluster. It is possible to use the Spark RESTful interface to determine the 
status of the Spark job.\n",
+    "\n",
+    "### TODO\n",
+    "\n",
+    "1. Run the cell below to see the status of the Spark Job."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TODO Run this cell. You should see at least one successful Time Series 
Spark job.\n",
+    "import requests\n",
+    "\n",
+    "response = 
requests.get('http://nexus-webapp:4040/api/v1/applications')\n",
+    "appId = response.json()[0]['id']\n",
+    "response = 
requests.get(\"http://nexus-webapp:4040/api/v1/applications/%s/jobs\"; % 
appId)\n",
+    "for job in response.json():\n",
+    "    print(job['name'])\n",
+    "    print('\\t' + job['status'])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Congratulations!\n",
+    "\n",
+    "You have successfully started a NEXUS analysis cluster and verified that 
it is functional. Your EC2 instance is now running both the infrastructure and 
the analysis cluster:\n",
+    "\n",
+    "![Infrastructure and Analysis](img/ec2-containers-analysis.png)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/esip-workshop/student-material/workshop2/4
 - Ingestion.ipynb
----------------------------------------------------------------------
diff --git a/esip-workshop/student-material/workshop2/4 - Ingestion.ipynb 
b/esip-workshop/student-material/workshop2/4 - Ingestion.ipynb
new file mode 100644
index 0000000..1545fda
--- /dev/null
+++ b/esip-workshop/student-material/workshop2/4 - Ingestion.ipynb      
@@ -0,0 +1,260 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "# Ingesting More Data\n",
+    "\n",
+    "NEXUS uses [Spring XD](http://projects.spring.io/spring-xd/) to ingest 
new data into the system. Spring XD is a distributed runtime that allows for 
parallel ingestion of data into data stores of all types. It requires a few 
tools for administrative purposes, including Redis and a Relational database 
management system (RDBMS).\n",
+    "\n",
+    "The Spring XD architecture also consists of a management application 
called XD Admin which manages XD Containers. Spring XD utilizes Apache 
Zookeeper to keep track of the state of the cluster and also uses [Apache 
Kafka](https://kafka.apache.org/) to communicate between it's components.\n",
+    "\n",
+    "\n",
+    "## Step 1: Start an Ingestion Cluster\n",
+    "\n",
+    "We can bring up an ingestion cluster by using `docker-compose`.\n",
+    "\n",
+    "### TODOs\n",
+    "\n",
+    "1. Navigate to the directory containing the docker-compose.yml file for 
the ingestion cluster\n",
+    "```bash\n",
+    "$ cd ~/nexus/esip-workshop/docker/ingest\n",
+    "```\n",
+    "\n",
+    "2. Use docker-compose to bring up the containers in the ingestion 
cluster\n",
+    "```bash\n",
+    "docker-compose up -d\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 2: Verify the Ingestion Cluster is Working\n",
+    "\n",
+    "Now that the cluster has started we can use various commands to ensure 
that it is operational and monitor its status.\n",
+    "\n",
+    "### TODO\n",
+    "\n",
+    "1. List all running docker containers.\n",
+    "```bash\n",
+    "$ docker ps\n",
+    "```\n",
+    "The output should look simillar to this:\n",
+    "<pre style=\"white-space: pre;\">\n",
+    "CONTAINER ID        IMAGE                         COMMAND                 
 CREATED             STATUS              PORTS                                  
          NAMES\n",
+    "581a05925ea6        nexusjpl/ingest-container     
\"/usr/local/nexus-...\"   5 seconds ago       Up 3 seconds        9393/tcp     
                                    xd-container2\n",
+    "1af7ba346d31        nexusjpl/ingest-container     
\"/usr/local/nexus-...\"   5 seconds ago       Up 3 seconds        9393/tcp     
                                    xd-container3\n",
+    "0668e2a48c9a        nexusjpl/ingest-container     
\"/usr/local/nexus-...\"   5 seconds ago       Up 3 seconds        9393/tcp     
                                    xd-container1\n",
+    "d717e6629b4a        nexusjpl/ingest-admin         
\"/usr/local/nexus-...\"   5 seconds ago       Up 4 seconds        9393/tcp     
                                    xd-admin\n",
+    "a4dae8ca6757        nexusjpl/kafka                
\"kafka-server-star...\"   7 seconds ago       Up 6 seconds                     
                                    kafka3\n",
+    "c29664cfae4a        nexusjpl/kafka                
\"kafka-server-star...\"   7 seconds ago       Up 6 seconds                     
                                    kafka2\n",
+    "623bdaa50207        nexusjpl/kafka                
\"kafka-server-star...\"   7 seconds ago       Up 6 seconds                     
                                    kafka1\n",
+    "2266c2a54113        redis:3                       
\"docker-entrypoint...\"   7 seconds ago       Up 5 seconds        6379/tcp     
                                    redis\n",
+    "da3267942d5f        mysql:8                       
\"docker-entrypoint...\"   7 seconds ago       Up 6 seconds        3306/tcp     
                                    mysqldb\n",
+    "e5589456a78a        nexusjpl/nexus-webapp         
\"/tmp/docker-entry...\"   31 hours ago        Up 31 hours         
0.0.0.0:4040->4040/tcp, 0.0.0.0:8083->8083/tcp   nexus-webapp\n",
+    "18e682b9af0e        nexusjpl/spark-mesos-agent    
\"/tmp/docker-entry...\"   31 hours ago        Up 31 hours                      
                                    mesos-agent1\n",
+    "8951841d1da6        nexusjpl/spark-mesos-agent    
\"/tmp/docker-entry...\"   31 hours ago        Up 31 hours                      
                                    mesos-agent3\n",
+    "c0240926a4a2        nexusjpl/spark-mesos-agent    
\"/tmp/docker-entry...\"   31 hours ago        Up 31 hours                      
                                    mesos-agent2\n",
+    "c97ad268833f        nexusjpl/spark-mesos-master   \"/bin/bash -c 
'./b...\"   31 hours ago        Up 31 hours         0.0.0.0:5050->5050/tcp      
                     mesos-master\n",
+    "90d370eb3a4e        nexusjpl/jupyter              \"tini -- 
start-not...\"   3 days ago          Up 3 days           0.0.0.0:8000->8888/tcp 
                          jupyter\n",
+    "cd0f47fe303d        nexusjpl/nexus-solr           
\"docker-entrypoint...\"   3 days ago          Up 3 days           8983/tcp     
                                    solr2\n",
+    "8c0f5c8eeb45        nexusjpl/nexus-solr           
\"docker-entrypoint...\"   3 days ago          Up 3 days           8983/tcp     
                                    solr3\n",
+    "27e34d14c16e        nexusjpl/nexus-solr           
\"docker-entrypoint...\"   3 days ago          Up 3 days           8983/tcp     
                                    solr1\n",
+    "247f807cb5ec        cassandra:2.2.8               
\"/docker-entrypoin...\"   3 days ago          Up 3 days           
7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp      cassandra3\n",
+    "09cc86a27321        zookeeper                     
\"/docker-entrypoin...\"   3 days ago          Up 3 days           2181/tcp, 
2888/tcp, 3888/tcp                     zk1\n",
+    "33e9d9b1b745        zookeeper                     
\"/docker-entrypoin...\"   3 days ago          Up 3 days           2181/tcp, 
2888/tcp, 3888/tcp                     zk3\n",
+    "dd29e4d09124        cassandra:2.2.8               
\"/docker-entrypoin...\"   3 days ago          Up 3 days           
7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp      cassandra2\n",
+    "11e57e0c972f        zookeeper                     
\"/docker-entrypoin...\"   3 days ago          Up 3 days           2181/tcp, 
2888/tcp, 3888/tcp                     zk2\n",
+    "2292803d942d        cassandra:2.2.8               
\"/docker-entrypoin...\"   3 days ago          Up 3 days           
7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp      cassandra1\n",
+    "</pre>\n",
+    "\n",
+    "2. View the log of the XD Admin container to verify it has started.\n",
+    "```bash\n",
+    "$ docker logs -f xd-admin\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 3: Ingest Some Data\n",
+    "\n",
+    "Now that the ingestion cluster has been started, we can ingest some new 
data into the system. Currently, there is AVHRR data ingested up through 2016. 
In this step you will ingest the remaining AVHRR data through July 2017. The 
source granules for AVHRR have already been copied to the EBS volume attached 
to your EC2 instance and mounted in the ingestion containers as 
`/usr/local/data/nexus/avhrr/2017`.\n",
+    "\n",
+    "In order to begin ingesting data, we need to deploy a new ingestion 
stream. The ingestion stream needs a few key parameters: the name of the 
dataset, where to look for the data files, the variable name to extract from 
the granules, and approximately how many tiles should be created per granule. 
These parameters can all be provided to the `nx-deploy-stream` shell script 
that is present in the `xd-admin` container.\n",
+    "\n",
+    "\n",
+    "### TODOs\n",
+    "\n",
+    "1. Deploy the stream to ingest the 2017 AVHRR data\n",
+    "```bash\n",
+    "$ docker exec -it xd-admin /usr/local/nx-deploy-stream.sh --datasetName 
AVHRR_OI_L4_GHRSST_NCEI --dataDirectory /usr/local/data/nexus/avhrr/2017 
--variableName analysed_sst --tilesDesired 1296\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 4: Monitor the Ingestion\n",
+    "\n",
+    "Once the stream is deployed, the data will begin to flow into the system. 
Progress can be monitored by tailing the log files and monitoring the number of 
tiles and granules that have been ingested into the system.\n",
+    "\n",
+    "### TODOs\n",
+    "\n",
+    "1. Get a listing of granules and tiles per granule for AVHRR 2017\n",
+    "2. Get a count of the number of granules ingested for AVHRR 2017\n",
+    "3. Verify the dataset list shows that granules have been ingested through 
July 2017"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# TODO Run this cell multiple times to watch as the granules are ingested 
into the system.\n",
+    "import requests\n",
+    "\n",
+    "dataset = 'AVHRR_OI_L4_GHRSST_NCEI'\n",
+    "year = 2017\n",
+    "\n",
+    "response = 
requests.get(\"http://solr1:8983/solr/nexustiles/query?q=granule_s:%d*&rows=0&fq=dataset_s:%s&facet.field=granule_s&facet=true&facet.mincount=1&facet.limit=-1&facet.sort=index\";
 % (year, dataset))\n",
+    "data = response.json()\n",
+    "for k in data['facet_counts'][\"facet_fields\"]['granule_s']:\n",
+    "    print(k)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TODO Run this cell to get a count of the number of AVHRR granules 
ingested for the year 2017.\n",
+    "# Ingestion is finished when there the total reaches 187\n",
+    "import requests\n",
+    "\n",
+    "dataset = 'AVHRR_OI_L4_GHRSST_NCEI'\n",
+    "year = 2017\n",
+    "\n",
+    "response = 
requests.get(\"http://solr1:8983/solr/nexustiles/query?q=granule_s:%d*&json.facet={granule_s:'unique(granule_s)'}&rows=0&fq=dataset_s:%s\"
 % (year, dataset))\n",
+    "data = response.json()\n",
+    "number_of_granules = data['facets']['granule_s'] if 'granule_s' in 
data['facets'] else 0\n",
+    "print(\"Number of granules for %s : %d\" % (dataset, number_of_granules))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# TODO Run this cell to get a list of datasets available along with their 
start and end dates.\n",
+    "import nexuscli\n",
+    "# Target the nexus webapp server\n",
+    "nexuscli.set_target(\"http://nexus-webapp:8083\";)\n",
+    "nexuscli.dataset_list()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 5: Run a Time Series With the new Data\n",
+    "\n",
+    "Once you have reached 187 total granules ingested for 2017 and see that 
AVHRR has data through July 2017, the ingestion has completed. You can now use 
the analytical functions on the new data.\n",
+    "\n",
+    "### TODOs\n",
+    "\n",
+    "1. Generate a Time Series using the new data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# TODO Run this cell to produce a Time Series plot using AVHRR data from 
2017.\n",
+    "%matplotlib inline\n",
+    "import matplotlib.pyplot as plt\n",
+    "import time\n",
+    "import nexuscli\n",
+    "from datetime import datetime\n",
+    "\n",
+    "from shapely.geometry import box\n",
+    "\n",
+    "bbox = box(-150, 40, -120, 55)\n",
+    "datasets = [\"AVHRR_OI_L4_GHRSST_NCEI\"]\n",
+    "start_time = datetime(2017, 1, 1)\n",
+    "end_time = datetime(2017, 7, 6)\n",
+    "\n",
+    "start = time.perf_counter()\n",
+    "ts, = nexuscli.time_series(datasets, bbox, start_time, end_time, 
spark=True)\n",
+    "print(\"Time Series took {} seconds to 
generate\".format(time.perf_counter() - start))\n",
+    "\n",
+    "plt.figure(figsize=(10,5), dpi=100)\n",
+    "plt.plot(ts.time, ts.mean, 'b-', marker='|', markersize=2.0, mfc='b')\n",
+    "plt.grid(b=True, which='major', color='k', linestyle='-')\n",
+    "plt.xlabel(\"Time\")\n",
+    "plt.ylabel (\"Sea Surface Temperature (C)\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Congratulations!\n",
+    "\n",
+    "You have completed this workshop. You now have a completely functional 
NEXUS cluster with all containers started:\n",
+    "\n",
+    "![Infrastructure and Analysis](img/ec2-containers.png)\n",
+    "\n",
+    "If you would like, you can go back to the workshop 1 notebooks and verify 
they are still working. More information about NEXUS is available on our 
[GitHub](https://github.com/dataplumber/nexus).\n",
+    "\n",
+    "If you are interested in learning more about Docker, Nga Quach will be 
giving a presentaion all about Docker Thursday, July 27 during the [Free and 
Open Source Software (FOSS) and Technologies for the 
Cloud](http://sched.co/As75) session.  \n",
+    "\n",
+    "\n",
+    "If you are interested in learning more about our Apache Spark, Joe Jacob 
will be giving a presentation all about Spark Thursday, July 27 during the 
[Free and Open Source Software (FOSS) and Technologies for the 
Cloud](http://sched.co/As75) session.    \n",
+    "\n",
+    "\n",
+    "Thank you for participating!"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/esip-workshop/student-material/workshop2/img/ec2-containers-analysis.png
----------------------------------------------------------------------
diff --git 
a/esip-workshop/student-material/workshop2/img/ec2-containers-analysis.png 
b/esip-workshop/student-material/workshop2/img/ec2-containers-analysis.png
new file mode 100644
index 0000000..3e2bfa9
Binary files /dev/null and 
b/esip-workshop/student-material/workshop2/img/ec2-containers-analysis.png 
differ

http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/esip-workshop/student-material/workshop2/img/ec2-containers-infrastructure.png
----------------------------------------------------------------------
diff --git 
a/esip-workshop/student-material/workshop2/img/ec2-containers-infrastructure.png
 
b/esip-workshop/student-material/workshop2/img/ec2-containers-infrastructure.png
new file mode 100644
index 0000000..743a2f8
Binary files /dev/null and 
b/esip-workshop/student-material/workshop2/img/ec2-containers-infrastructure.png
 differ

http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/esip-workshop/student-material/workshop2/img/ec2-containers.png
----------------------------------------------------------------------
diff --git a/esip-workshop/student-material/workshop2/img/ec2-containers.png 
b/esip-workshop/student-material/workshop2/img/ec2-containers.png
new file mode 100644
index 0000000..5942038
Binary files /dev/null and 
b/esip-workshop/student-material/workshop2/img/ec2-containers.png differ

http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/nexus-ingest/.gitignore
----------------------------------------------------------------------
diff --git a/nexus-ingest/.gitignore b/nexus-ingest/.gitignore
new file mode 100644
index 0000000..0b58aaa
--- /dev/null
+++ b/nexus-ingest/.gitignore
@@ -0,0 +1,4 @@
+.DS_Store
+
+.idea
+*.iml

http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/nexus-ingest/README.md
----------------------------------------------------------------------
diff --git a/nexus-ingest/README.md b/nexus-ingest/README.md
new file mode 100644
index 0000000..87cf31c
--- /dev/null
+++ b/nexus-ingest/README.md
@@ -0,0 +1,3 @@
+# nexus-ingest
+
+This folder contains all of the custom code needed to ingest data into the 
nexus system.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/nexus-ingest/dataset-tiler/.gitignore
----------------------------------------------------------------------
diff --git a/nexus-ingest/dataset-tiler/.gitignore 
b/nexus-ingest/dataset-tiler/.gitignore
new file mode 100644
index 0000000..965dacc
--- /dev/null
+++ b/nexus-ingest/dataset-tiler/.gitignore
@@ -0,0 +1,28 @@
+.gradle/
+.idea/
+gradlew.bat
+
+.DS_Store
+*.log
+
+
+build/*
+!build/reports
+
+build/reports/*
+!build/reports/license
+!build/reports/project
+
+#Idea files
+*.iml
+*.ipr
+*.iws
+
+*.class
+
+# Package Files #
+*.war
+*.ear
+
+# virtual machine crash logs, see 
http://www.java.com/en/download/help/error_hotspot.xml
+hs_err_pid*
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/nexus-ingest/dataset-tiler/README.md
----------------------------------------------------------------------
diff --git a/nexus-ingest/dataset-tiler/README.md 
b/nexus-ingest/dataset-tiler/README.md
new file mode 100644
index 0000000..043daca
--- /dev/null
+++ b/nexus-ingest/dataset-tiler/README.md
@@ -0,0 +1,11 @@
+# dataset-tiler
+
+[Spring-XD 
Module](http://docs.spring.io/spring-xd/docs/current/reference/html/#modules) 
that creates SectionSpecs that can be used to read the data from the dataset in 
tiles.
+
+The project can be built by running
+
+`./gradlew clean build`
+
+The module can then be uploaded to Spring XD by running the following command 
in an XD Shell
+
+`module upload --type processor --name dataset-tiler --file 
dataset-tiler/build/libs/dataset-tiler-VERSION.jar`
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/nexus-ingest/dataset-tiler/build.gradle
----------------------------------------------------------------------
diff --git a/nexus-ingest/dataset-tiler/build.gradle 
b/nexus-ingest/dataset-tiler/build.gradle
new file mode 100644
index 0000000..b818457
--- /dev/null
+++ b/nexus-ingest/dataset-tiler/build.gradle
@@ -0,0 +1,89 @@
+buildscript {
+    repositories {
+        maven {
+            url "http://repo.spring.io/plugins-snapshot";
+        }
+        maven {
+            url 'http://repo.spring.io/plugins-release'
+        }
+        maven {
+            url "http://repo.spring.io/release";
+        }
+        maven {
+            url "http://repo.spring.io/milestone";
+        }
+        maven {
+            url "http://repo.spring.io/snapshot";
+        }
+        jcenter()
+        mavenCentral()
+    }
+    dependencies {
+        
classpath("org.springframework.xd:spring-xd-module-plugin:1.3.1.RELEASE")
+    }
+}
+
+ext {
+    springXdVersion = '1.3.1.RELEASE'
+    springIntegrationDslVersion = '1.1.2.RELEASE'
+    netcdfJavaVersion = '4.6.3'
+}
+
+apply plugin: 'java'
+apply plugin: 'groovy'
+apply plugin: 'idea'
+apply plugin: 'maven'
+apply plugin: 'spring-xd-module'
+apply plugin: 'project-report'
+
+group = 'org.nasa.jpl.nexus.ingest'
+version = '1.0.0.BUILD-SNAPSHOT'
+mainClassName = ''
+
+sourceCompatibility = 1.8
+targetCompatibility = 1.8
+
+repositories {
+    maven {
+        url "http://repo.spring.io/release";
+    }
+    mavenCentral()
+    jcenter()
+    maven {
+        url "http://repo.spring.io/snapshot";
+    }
+    maven {
+        url "http://repo.spring.io/milestone";
+    }
+    maven {
+        url 
"https://artifacts.unidata.ucar.edu/content/repositories/unidata-releases/";
+    }
+}
+
+sourceSets {
+    main {
+        groovy {
+            // override the default locations, rather than adding additional 
ones
+            srcDirs = ['src/main/groovy', 'src/main/java']
+        }
+        java {
+            srcDirs = [] // don't compile Java code twice
+        }
+    }
+}
+
+dependencies {
+    
compile("org.springframework.integration:spring-integration-java-dsl:${springIntegrationDslVersion}")
+    compile "edu.ucar:cdm:${netcdfJavaVersion}"
+    compile("org.springframework.boot:spring-boot-starter-integration")
+    compile('org.codehaus.groovy:groovy')
+
+    provided "javax.validation:validation-api"
+
+    testCompile("org.springframework.boot:spring-boot-starter-test")
+}
+
+
+task wrapper(type: Wrapper) {
+    gradleVersion = '2.12'
+}

[21/51] [partial] incubator-sdap-nexus git commit: SDAP-1 Import all code under the SDAP SGA

Reply via email to