http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/esip-workshop/student-material/workshop2/1- Introduction.ipynb ---------------------------------------------------------------------- diff --git a/esip-workshop/student-material/workshop2/1- Introduction.ipynb b/esip-workshop/student-material/workshop2/1- Introduction.ipynb new file mode 100644 index 0000000..483f34b --- /dev/null +++ b/esip-workshop/student-material/workshop2/1- Introduction.ipynb @@ -0,0 +1,62 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Introduction\n", + "\n", + "In this workshop you will learn how to deploy a NEXUS system using Docker.\n", + "\n", + "## EC2\n", + "\n", + "Each student or group will be assigned an EC2 instance to use for this workshop. The EC2 instance we will be using is: \n", + "> r4.8xlarge \n", + "> Memory: 244.0 GB \n", + "> vCPUs: 32 \n", + "> EBS Storage: 500 GB gp2 \n", + "\n", + "## SSH\n", + "\n", + "You will be using SSH to connect to the Amazon EC2 instance assigned to you. You will need an SSH client on your laptop. All shell commands for this workshop will take place over the SSH connection.\n", + "\n", + "__NOTE__: Shell commands you are expected to run will be prefixed with a dollar sign `$`\n", + "\n", + "## Docker\n", + "\n", + "Docker is already installed on the EC2 instance. You will be asked to interact with the Docker command line client during this workshop. \n", + "\n", + "`docker-compose` is used to coordinate the startup and stopping of the different components of the NEXUS system during this workshop.\n", + "\n", + "## System Architecture\n", + "\n", + "We are attempting to simulate a cluster deployment on a single machine. By the end of this workshop there will be 24 containers running on your EC2 instance.\n", + "\n", + "\n", + "\n", + "In a production deployment, these containers would most likely be running on different machines and may be sized differently.\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.1" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}
http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/esip-workshop/student-material/workshop2/2 - Infrastructure.ipynb ---------------------------------------------------------------------- diff --git a/esip-workshop/student-material/workshop2/2 - Infrastructure.ipynb b/esip-workshop/student-material/workshop2/2 - Infrastructure.ipynb new file mode 100644 index 0000000..d74fa9e --- /dev/null +++ b/esip-workshop/student-material/workshop2/2 - Infrastructure.ipynb @@ -0,0 +1,160 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Starting the Infrastructure Cluster\n", + "\n", + "NEXUS relies on [Apache Solr](http://lucene.apache.org/solr/) to store metadata about tiles and [Apache Cassandra](http://cassandra.apache.org/) to store the floating point array data associated with those tiles. Both Solr and Cassandra are distributed storage systems and can be run in a cluster. \n", + "\n", + "Solr requires [Apache Zookeeper](https://zookeeper.apache.org/) to run in cluster mode (called SolrCloud). This notebook walks through the process of bringing up a 3 node Cassandra cluster, 3 node Zookeeper cluster, and a 3 node SolrCloud.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 1: Start One Cassandra Container\n", + "\n", + "When initializing a Cassandra cluster, one or more nodes must be designated as a 'seed' node to help bootstrap the internal communication between nodes: [Internode communications (gossip)](http://docs.datastax.com/en/cassandra/2.1/cassandra/architecture/architectureGossipAbout_c.html).\n", + "\n", + "Therefore, the first step is to start one Cassandra container so that it can act as the seed node for the rest of our cluster.\n", + "\n", + "### TODO\n", + "1. Navigate to the directory containing the `docker-compose.yml` file for the infrastructure cluster\n", + "```bash\n", + "$ cd ~/nexus/esip-workshop/docker/infrastructure\n", + "```\n", + "\n", + "2. Use `docker-compose` to bring up the `cassandra1` container.\n", + "```bash\n", + "$ docker-compose up -d cassandra1\n", + "```\n", + "\n", + "3. Wait for the Cassandra node to become ready before continuing. Run the following command to follow the logs for `cassandra1`.\n", + "```bash\n", + "$ docker logs -f cassandra1\n", + "```\n", + "\n", + "4. Wait for the Cassandra node to start listening for clients. It should only take a minute or so. Look for this line in the logs:\n", + "> Starting listening for CQL clients on /0.0.0.0:9042\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 2: Start the Remaining Infrastructure Containers\n", + "\n", + "Once the first Cassandra node is running, the rest of the infrastructure cluster can be brought online. The remaining 8 containers in the infrastructure can be started using the `docker-compose` command again.\n", + "\n", + "### TODO\n", + "\n", + "1. Use `docker-compose` to bring up the remaining containers. __Note__: Make sure you are still in the same directory as Step 1 `~/nexus/esip-workshop/docker/infrastructure`.\n", + "```bash\n", + "$ docker-compose up -d\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 3: Verify the Infrastructure has Started\n", + "\n", + "Now there should be 9 containers running that make up our 3 node Cassandra cluster, 3 node Zookeeper cluster, and 3 node SolrCloud. We can use a variety of commands to verify that our cluster is active and healthy.\n", + "\n", + "### TODO\n", + "\n", + "1. List all running docker containers.\n", + "```bash\n", + "$ docker ps\n", + "```\n", + "The output should look simillar to this:\n", + "<pre style=\"white-space: pre;\">\n", + "CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES \n", + "90d370eb3a4e nexusjpl/jupyter \"tini -- start-not...\" 30 hours ago Up 30 hours 0.0.0.0:8000->8888/tcp jupyter \n", + "cd0f47fe303d nexusjpl/nexus-solr \"docker-entrypoint...\" 30 hours ago Up 30 hours 8983/tcp solr2 \n", + "8c0f5c8eeb45 nexusjpl/nexus-solr \"docker-entrypoint...\" 30 hours ago Up 30 hours 8983/tcp solr3 \n", + "27e34d14c16e nexusjpl/nexus-solr \"docker-entrypoint...\" 30 hours ago Up 30 hours 8983/tcp solr1 \n", + "247f807cb5ec cassandra:2.2.8 \"/docker-entrypoin...\" 30 hours ago Up 30 hours 7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp cassandra3 \n", + "09cc86a27321 zookeeper \"/docker-entrypoin...\" 30 hours ago Up 30 hours 2181/tcp, 2888/tcp, 3888/tcp zk1 \n", + "33e9d9b1b745 zookeeper \"/docker-entrypoin...\" 30 hours ago Up 30 hours 2181/tcp, 2888/tcp, 3888/tcp zk3 \n", + "dd29e4d09124 cassandra:2.2.8 \"/docker-entrypoin...\" 30 hours ago Up 30 hours 7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp cassandra2 \n", + "11e57e0c972f zookeeper \"/docker-entrypoin...\" 30 hours ago Up 30 hours 2181/tcp, 2888/tcp, 3888/tcp zk2 \n", + "2292803d942d cassandra:2.2.8 \"/docker-entrypoin...\" 30 hours ago Up 30 hours 7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp cassandra1 \n", + "</pre>\n", + "\n", + "2. Get the Cassandra cluster status by running `nodetool status` inside the `cassandra1` container.\n", + "```bash\n", + "$ docker exec cassandra1 nodetool status\n", + "```\n", + "You should see 3 cluster nodes:\n", + "<pre style=\"white-space: pre;\">\n", + "Datacenter: datacenter1\n", + "=======================\n", + "Status=Up/Down\n", + "|/ State=Normal/Leaving/Joining/Moving\n", + "-- Address Load Tokens Owns (effective) Host ID Rack\n", + "UN 172.18.0.2 4.8 GB 256 35.3% d9a0d273-b11c-41dd-9da1-cb77882f275f rack1\n", + "UN 172.18.0.5 4.42 GB 256 33.2% d68d9ea7-04a0-4eaf-b9c6-333b606bd2b1 rack1\n", + "UN 172.18.0.7 4.16 GB 256 31.5% 6f8683f9-abf8-4466-87bc-a5faa048956d rack1\n", + "</pre>\n", + "\n", + "3. Get the status of the SolrCloud by running the cell below" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# TODO Run this cell to get the status of the Solr Cluster. You should see a collection called\n", + "# 'nexustiles' with 3 shards spread across all 3 nodes.\n", + "\n", + "import requests\n", + "import json\n", + "\n", + "response = requests.get('http://solr1:8983/solr/admin/collections?action=clusterstatus&wt=json')\n", + "print(json.dumps(response.json(), indent=2))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Congratulations!\n", + "\n", + "You have sucessfully started up the NEXUS infrastructure. Your EC2 instance now has 9 containers running:\n", + "\n", + "" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.1" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/esip-workshop/student-material/workshop2/3 - Analysis.ipynb ---------------------------------------------------------------------- diff --git a/esip-workshop/student-material/workshop2/3 - Analysis.ipynb b/esip-workshop/student-material/workshop2/3 - Analysis.ipynb new file mode 100644 index 0000000..468c9f7 --- /dev/null +++ b/esip-workshop/student-material/workshop2/3 - Analysis.ipynb @@ -0,0 +1,218 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "collapsed": true + }, + "source": [ + "# Starting the Analysis Cluster\n", + "\n", + "NEXUS utilizes [Apache Spark](https://spark.apache.org/) running on [Apache Mesos](http://mesos.apache.org/) for its analytical functions. Now that the infrastructure has been started, we can start up the analysis cluster.\n", + "\n", + "The analysis cluster consists of and Apache Mesos cluster and the NEXUS webapp [Tornado server](http://www.tornadoweb.org/en/stable/). The Mesos cluster we will be bringing up has one master node and three agent nodes. Apache Spark is already installed and configured on the three agent nodes and will act as Spark executors for the NEXUS analytic functions.\n", + "\n", + "## Step 1: Start the Containers\n", + "\n", + "We can use `docker-compose` again to start our containers.\n", + "\n", + "### TODO\n", + "\n", + "1. Navigate to the directory containing the docker-compose.yml file for the analysis cluster\n", + "```bash\n", + "$ cd ~/nexus/esip-workshop/docker/analysis\n", + "```\n", + "\n", + "2. Use docker-compose to bring up the containers in the analysis cluster\n", + "```bash\n", + "$ docker-compose up -d\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 2: Verify the Cluster is Working\n", + "\n", + "Now that the cluster has started we can use various commands to ensure that it is operational and monitor its status.\n", + "\n", + "### TODO\n", + "\n", + "1. List all running docker containers.\n", + "```bash\n", + "$ docker ps\n", + "```\n", + "The output should look simillar to this:\n", + "<pre style=\"white-space: pre;\">\n", + "CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES\n", + "e5589456a78a nexusjpl/nexus-webapp \"/tmp/docker-entry...\" 5 seconds ago Up 5 seconds 0.0.0.0:4040->4040/tcp, 0.0.0.0:8083->8083/tcp nexus-webapp\n", + "18e682b9af0e nexusjpl/spark-mesos-agent \"/tmp/docker-entry...\" 7 seconds ago Up 5 seconds mesos-agent1\n", + "8951841d1da6 nexusjpl/spark-mesos-agent \"/tmp/docker-entry...\" 7 seconds ago Up 6 seconds mesos-agent3\n", + "c0240926a4a2 nexusjpl/spark-mesos-agent \"/tmp/docker-entry...\" 7 seconds ago Up 6 seconds mesos-agent2\n", + "c97ad268833f nexusjpl/spark-mesos-master \"/bin/bash -c './b...\" 7 seconds ago Up 7 seconds 0.0.0.0:5050->5050/tcp mesos-master\n", + "90d370eb3a4e nexusjpl/jupyter \"tini -- start-not...\" 2 days ago Up 2 days 0.0.0.0:8000->8888/tcp jupyter\n", + "cd0f47fe303d nexusjpl/nexus-solr \"docker-entrypoint...\" 2 days ago Up 2 days 8983/tcp solr2\n", + "8c0f5c8eeb45 nexusjpl/nexus-solr \"docker-entrypoint...\" 2 days ago Up 2 days 8983/tcp solr3\n", + "27e34d14c16e nexusjpl/nexus-solr \"docker-entrypoint...\" 2 days ago Up 2 days 8983/tcp solr1\n", + "247f807cb5ec cassandra:2.2.8 \"/docker-entrypoin...\" 2 days ago Up 2 days 7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp cassandra3\n", + "09cc86a27321 zookeeper \"/docker-entrypoin...\" 2 days ago Up 2 days 2181/tcp, 2888/tcp, 3888/tcp zk1\n", + "33e9d9b1b745 zookeeper \"/docker-entrypoin...\" 2 days ago Up 2 days 2181/tcp, 2888/tcp, 3888/tcp zk3\n", + "dd29e4d09124 cassandra:2.2.8 \"/docker-entrypoin...\" 2 days ago Up 2 days 7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp cassandra2\n", + "11e57e0c972f zookeeper \"/docker-entrypoin...\" 2 days ago Up 2 days 2181/tcp, 2888/tcp, 3888/tcp zk2\n", + "2292803d942d cassandra:2.2.8 \"/docker-entrypoin...\" 2 days ago Up 2 days 7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp cassandra1\n", + "</pre>\n", + "\n", + "2. List the available Mesos slaves by running the cell below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# TODO Run this cell to see the status of the Mesos slaves. You should see 3 slaves connected.\n", + "\n", + "import requests\n", + "import json\n", + "\n", + "response = requests.get('http://mesos-master:5050/state.json')\n", + "print(json.dumps(response.json()['slaves'], indent=2))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 3: List available Datasets\n", + "\n", + "Now that the cluster is up, we can investigate the datasets available. Use the `nexuscli` module to list available datatsets.\n", + "\n", + "### TODO \n", + "1. Get a list of datasets by using the `nexuscli` module to issue a request to the `nexus-webapp` container that was just started." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import nexuscli\n", + "\n", + "nexuscli.set_target(\"http://nexus-webapp:8083\")\n", + "nexuscli.dataset_list()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 4: Run a Time Series\n", + "\n", + "Verify the analysis functions are working by running a simple Time Series.\n", + "\n", + "### TODO\n", + "\n", + "1. Run the cell below to produce a time series plot using the analysis cluster you just started." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO Run this cell to produce a Time Series plot using AVHRR data.\n", + "%matplotlib inline\n", + "import matplotlib.pyplot as plt\n", + "import time\n", + "import nexuscli\n", + "from datetime import datetime\n", + "\n", + "from shapely.geometry import box\n", + "\n", + "bbox = box(-150, 40, -120, 55)\n", + "datasets = [\"AVHRR_OI_L4_GHRSST_NCEI\"]\n", + "start_time = datetime(2013, 1, 1)\n", + "end_time = datetime(2013, 12, 31)\n", + "\n", + "start = time.perf_counter()\n", + "ts, = nexuscli.time_series(datasets, bbox, start_time, end_time, spark=True)\n", + "print(\"Time Series took {} seconds to generate\".format(time.perf_counter() - start))\n", + "\n", + "plt.figure(figsize=(10,5), dpi=100)\n", + "plt.plot(ts.time, ts.mean, 'b-', marker='|', markersize=2.0, mfc='b')\n", + "plt.grid(b=True, which='major', color='k', linestyle='-')\n", + "plt.xlabel(\"Time\")\n", + "plt.ylabel (\"Sea Surface Temperature (C)\")\n", + "plt.show()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 5: Check the Results of the Spark Job\n", + "\n", + "The time series function in the previous cell will run on the Spark cluster. It is possible to use the Spark RESTful interface to determine the status of the Spark job.\n", + "\n", + "### TODO\n", + "\n", + "1. Run the cell below to see the status of the Spark Job." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO Run this cell. You should see at least one successful Time Series Spark job.\n", + "import requests\n", + "\n", + "response = requests.get('http://nexus-webapp:4040/api/v1/applications')\n", + "appId = response.json()[0]['id']\n", + "response = requests.get(\"http://nexus-webapp:4040/api/v1/applications/%s/jobs\" % appId)\n", + "for job in response.json():\n", + " print(job['name'])\n", + " print('\\t' + job['status'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Congratulations!\n", + "\n", + "You have successfully started a NEXUS analysis cluster and verified that it is functional. Your EC2 instance is now running both the infrastructure and the analysis cluster:\n", + "\n", + "" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.1" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/esip-workshop/student-material/workshop2/4 - Ingestion.ipynb ---------------------------------------------------------------------- diff --git a/esip-workshop/student-material/workshop2/4 - Ingestion.ipynb b/esip-workshop/student-material/workshop2/4 - Ingestion.ipynb new file mode 100644 index 0000000..1545fda --- /dev/null +++ b/esip-workshop/student-material/workshop2/4 - Ingestion.ipynb @@ -0,0 +1,260 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "collapsed": true + }, + "source": [ + "# Ingesting More Data\n", + "\n", + "NEXUS uses [Spring XD](http://projects.spring.io/spring-xd/) to ingest new data into the system. Spring XD is a distributed runtime that allows for parallel ingestion of data into data stores of all types. It requires a few tools for administrative purposes, including Redis and a Relational database management system (RDBMS).\n", + "\n", + "The Spring XD architecture also consists of a management application called XD Admin which manages XD Containers. Spring XD utilizes Apache Zookeeper to keep track of the state of the cluster and also uses [Apache Kafka](https://kafka.apache.org/) to communicate between it's components.\n", + "\n", + "\n", + "## Step 1: Start an Ingestion Cluster\n", + "\n", + "We can bring up an ingestion cluster by using `docker-compose`.\n", + "\n", + "### TODOs\n", + "\n", + "1. Navigate to the directory containing the docker-compose.yml file for the ingestion cluster\n", + "```bash\n", + "$ cd ~/nexus/esip-workshop/docker/ingest\n", + "```\n", + "\n", + "2. Use docker-compose to bring up the containers in the ingestion cluster\n", + "```bash\n", + "docker-compose up -d\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 2: Verify the Ingestion Cluster is Working\n", + "\n", + "Now that the cluster has started we can use various commands to ensure that it is operational and monitor its status.\n", + "\n", + "### TODO\n", + "\n", + "1. List all running docker containers.\n", + "```bash\n", + "$ docker ps\n", + "```\n", + "The output should look simillar to this:\n", + "<pre style=\"white-space: pre;\">\n", + "CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES\n", + "581a05925ea6 nexusjpl/ingest-container \"/usr/local/nexus-...\" 5 seconds ago Up 3 seconds 9393/tcp xd-container2\n", + "1af7ba346d31 nexusjpl/ingest-container \"/usr/local/nexus-...\" 5 seconds ago Up 3 seconds 9393/tcp xd-container3\n", + "0668e2a48c9a nexusjpl/ingest-container \"/usr/local/nexus-...\" 5 seconds ago Up 3 seconds 9393/tcp xd-container1\n", + "d717e6629b4a nexusjpl/ingest-admin \"/usr/local/nexus-...\" 5 seconds ago Up 4 seconds 9393/tcp xd-admin\n", + "a4dae8ca6757 nexusjpl/kafka \"kafka-server-star...\" 7 seconds ago Up 6 seconds kafka3\n", + "c29664cfae4a nexusjpl/kafka \"kafka-server-star...\" 7 seconds ago Up 6 seconds kafka2\n", + "623bdaa50207 nexusjpl/kafka \"kafka-server-star...\" 7 seconds ago Up 6 seconds kafka1\n", + "2266c2a54113 redis:3 \"docker-entrypoint...\" 7 seconds ago Up 5 seconds 6379/tcp redis\n", + "da3267942d5f mysql:8 \"docker-entrypoint...\" 7 seconds ago Up 6 seconds 3306/tcp mysqldb\n", + "e5589456a78a nexusjpl/nexus-webapp \"/tmp/docker-entry...\" 31 hours ago Up 31 hours 0.0.0.0:4040->4040/tcp, 0.0.0.0:8083->8083/tcp nexus-webapp\n", + "18e682b9af0e nexusjpl/spark-mesos-agent \"/tmp/docker-entry...\" 31 hours ago Up 31 hours mesos-agent1\n", + "8951841d1da6 nexusjpl/spark-mesos-agent \"/tmp/docker-entry...\" 31 hours ago Up 31 hours mesos-agent3\n", + "c0240926a4a2 nexusjpl/spark-mesos-agent \"/tmp/docker-entry...\" 31 hours ago Up 31 hours mesos-agent2\n", + "c97ad268833f nexusjpl/spark-mesos-master \"/bin/bash -c './b...\" 31 hours ago Up 31 hours 0.0.0.0:5050->5050/tcp mesos-master\n", + "90d370eb3a4e nexusjpl/jupyter \"tini -- start-not...\" 3 days ago Up 3 days 0.0.0.0:8000->8888/tcp jupyter\n", + "cd0f47fe303d nexusjpl/nexus-solr \"docker-entrypoint...\" 3 days ago Up 3 days 8983/tcp solr2\n", + "8c0f5c8eeb45 nexusjpl/nexus-solr \"docker-entrypoint...\" 3 days ago Up 3 days 8983/tcp solr3\n", + "27e34d14c16e nexusjpl/nexus-solr \"docker-entrypoint...\" 3 days ago Up 3 days 8983/tcp solr1\n", + "247f807cb5ec cassandra:2.2.8 \"/docker-entrypoin...\" 3 days ago Up 3 days 7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp cassandra3\n", + "09cc86a27321 zookeeper \"/docker-entrypoin...\" 3 days ago Up 3 days 2181/tcp, 2888/tcp, 3888/tcp zk1\n", + "33e9d9b1b745 zookeeper \"/docker-entrypoin...\" 3 days ago Up 3 days 2181/tcp, 2888/tcp, 3888/tcp zk3\n", + "dd29e4d09124 cassandra:2.2.8 \"/docker-entrypoin...\" 3 days ago Up 3 days 7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp cassandra2\n", + "11e57e0c972f zookeeper \"/docker-entrypoin...\" 3 days ago Up 3 days 2181/tcp, 2888/tcp, 3888/tcp zk2\n", + "2292803d942d cassandra:2.2.8 \"/docker-entrypoin...\" 3 days ago Up 3 days 7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp cassandra1\n", + "</pre>\n", + "\n", + "2. View the log of the XD Admin container to verify it has started.\n", + "```bash\n", + "$ docker logs -f xd-admin\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 3: Ingest Some Data\n", + "\n", + "Now that the ingestion cluster has been started, we can ingest some new data into the system. Currently, there is AVHRR data ingested up through 2016. In this step you will ingest the remaining AVHRR data through July 2017. The source granules for AVHRR have already been copied to the EBS volume attached to your EC2 instance and mounted in the ingestion containers as `/usr/local/data/nexus/avhrr/2017`.\n", + "\n", + "In order to begin ingesting data, we need to deploy a new ingestion stream. The ingestion stream needs a few key parameters: the name of the dataset, where to look for the data files, the variable name to extract from the granules, and approximately how many tiles should be created per granule. These parameters can all be provided to the `nx-deploy-stream` shell script that is present in the `xd-admin` container.\n", + "\n", + "\n", + "### TODOs\n", + "\n", + "1. Deploy the stream to ingest the 2017 AVHRR data\n", + "```bash\n", + "$ docker exec -it xd-admin /usr/local/nx-deploy-stream.sh --datasetName AVHRR_OI_L4_GHRSST_NCEI --dataDirectory /usr/local/data/nexus/avhrr/2017 --variableName analysed_sst --tilesDesired 1296\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 4: Monitor the Ingestion\n", + "\n", + "Once the stream is deployed, the data will begin to flow into the system. Progress can be monitored by tailing the log files and monitoring the number of tiles and granules that have been ingested into the system.\n", + "\n", + "### TODOs\n", + "\n", + "1. Get a listing of granules and tiles per granule for AVHRR 2017\n", + "2. Get a count of the number of granules ingested for AVHRR 2017\n", + "3. Verify the dataset list shows that granules have been ingested through July 2017" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# TODO Run this cell multiple times to watch as the granules are ingested into the system.\n", + "import requests\n", + "\n", + "dataset = 'AVHRR_OI_L4_GHRSST_NCEI'\n", + "year = 2017\n", + "\n", + "response = requests.get(\"http://solr1:8983/solr/nexustiles/query?q=granule_s:%d*&rows=0&fq=dataset_s:%s&facet.field=granule_s&facet=true&facet.mincount=1&facet.limit=-1&facet.sort=index\" % (year, dataset))\n", + "data = response.json()\n", + "for k in data['facet_counts'][\"facet_fields\"]['granule_s']:\n", + " print(k)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO Run this cell to get a count of the number of AVHRR granules ingested for the year 2017.\n", + "# Ingestion is finished when there the total reaches 187\n", + "import requests\n", + "\n", + "dataset = 'AVHRR_OI_L4_GHRSST_NCEI'\n", + "year = 2017\n", + "\n", + "response = requests.get(\"http://solr1:8983/solr/nexustiles/query?q=granule_s:%d*&json.facet={granule_s:'unique(granule_s)'}&rows=0&fq=dataset_s:%s\" % (year, dataset))\n", + "data = response.json()\n", + "number_of_granules = data['facets']['granule_s'] if 'granule_s' in data['facets'] else 0\n", + "print(\"Number of granules for %s : %d\" % (dataset, number_of_granules))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# TODO Run this cell to get a list of datasets available along with their start and end dates.\n", + "import nexuscli\n", + "# Target the nexus webapp server\n", + "nexuscli.set_target(\"http://nexus-webapp:8083\")\n", + "nexuscli.dataset_list()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 5: Run a Time Series With the new Data\n", + "\n", + "Once you have reached 187 total granules ingested for 2017 and see that AVHRR has data through July 2017, the ingestion has completed. You can now use the analytical functions on the new data.\n", + "\n", + "### TODOs\n", + "\n", + "1. Generate a Time Series using the new data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# TODO Run this cell to produce a Time Series plot using AVHRR data from 2017.\n", + "%matplotlib inline\n", + "import matplotlib.pyplot as plt\n", + "import time\n", + "import nexuscli\n", + "from datetime import datetime\n", + "\n", + "from shapely.geometry import box\n", + "\n", + "bbox = box(-150, 40, -120, 55)\n", + "datasets = [\"AVHRR_OI_L4_GHRSST_NCEI\"]\n", + "start_time = datetime(2017, 1, 1)\n", + "end_time = datetime(2017, 7, 6)\n", + "\n", + "start = time.perf_counter()\n", + "ts, = nexuscli.time_series(datasets, bbox, start_time, end_time, spark=True)\n", + "print(\"Time Series took {} seconds to generate\".format(time.perf_counter() - start))\n", + "\n", + "plt.figure(figsize=(10,5), dpi=100)\n", + "plt.plot(ts.time, ts.mean, 'b-', marker='|', markersize=2.0, mfc='b')\n", + "plt.grid(b=True, which='major', color='k', linestyle='-')\n", + "plt.xlabel(\"Time\")\n", + "plt.ylabel (\"Sea Surface Temperature (C)\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Congratulations!\n", + "\n", + "You have completed this workshop. You now have a completely functional NEXUS cluster with all containers started:\n", + "\n", + "\n", + "\n", + "If you would like, you can go back to the workshop 1 notebooks and verify they are still working. More information about NEXUS is available on our [GitHub](https://github.com/dataplumber/nexus).\n", + "\n", + "If you are interested in learning more about Docker, Nga Quach will be giving a presentaion all about Docker Thursday, July 27 during the [Free and Open Source Software (FOSS) and Technologies for the Cloud](http://sched.co/As75) session. \n", + "\n", + "\n", + "If you are interested in learning more about our Apache Spark, Joe Jacob will be giving a presentation all about Spark Thursday, July 27 during the [Free and Open Source Software (FOSS) and Technologies for the Cloud](http://sched.co/As75) session. \n", + "\n", + "\n", + "Thank you for participating!" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.1" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/esip-workshop/student-material/workshop2/img/ec2-containers-analysis.png ---------------------------------------------------------------------- diff --git a/esip-workshop/student-material/workshop2/img/ec2-containers-analysis.png b/esip-workshop/student-material/workshop2/img/ec2-containers-analysis.png new file mode 100644 index 0000000..3e2bfa9 Binary files /dev/null and b/esip-workshop/student-material/workshop2/img/ec2-containers-analysis.png differ http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/esip-workshop/student-material/workshop2/img/ec2-containers-infrastructure.png ---------------------------------------------------------------------- diff --git a/esip-workshop/student-material/workshop2/img/ec2-containers-infrastructure.png b/esip-workshop/student-material/workshop2/img/ec2-containers-infrastructure.png new file mode 100644 index 0000000..743a2f8 Binary files /dev/null and b/esip-workshop/student-material/workshop2/img/ec2-containers-infrastructure.png differ http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/esip-workshop/student-material/workshop2/img/ec2-containers.png ---------------------------------------------------------------------- diff --git a/esip-workshop/student-material/workshop2/img/ec2-containers.png b/esip-workshop/student-material/workshop2/img/ec2-containers.png new file mode 100644 index 0000000..5942038 Binary files /dev/null and b/esip-workshop/student-material/workshop2/img/ec2-containers.png differ http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/nexus-ingest/.gitignore ---------------------------------------------------------------------- diff --git a/nexus-ingest/.gitignore b/nexus-ingest/.gitignore new file mode 100644 index 0000000..0b58aaa --- /dev/null +++ b/nexus-ingest/.gitignore @@ -0,0 +1,4 @@ +.DS_Store + +.idea +*.iml http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/nexus-ingest/README.md ---------------------------------------------------------------------- diff --git a/nexus-ingest/README.md b/nexus-ingest/README.md new file mode 100644 index 0000000..87cf31c --- /dev/null +++ b/nexus-ingest/README.md @@ -0,0 +1,3 @@ +# nexus-ingest + +This folder contains all of the custom code needed to ingest data into the nexus system. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/nexus-ingest/dataset-tiler/.gitignore ---------------------------------------------------------------------- diff --git a/nexus-ingest/dataset-tiler/.gitignore b/nexus-ingest/dataset-tiler/.gitignore new file mode 100644 index 0000000..965dacc --- /dev/null +++ b/nexus-ingest/dataset-tiler/.gitignore @@ -0,0 +1,28 @@ +.gradle/ +.idea/ +gradlew.bat + +.DS_Store +*.log + + +build/* +!build/reports + +build/reports/* +!build/reports/license +!build/reports/project + +#Idea files +*.iml +*.ipr +*.iws + +*.class + +# Package Files # +*.war +*.ear + +# virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml +hs_err_pid* \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/nexus-ingest/dataset-tiler/README.md ---------------------------------------------------------------------- diff --git a/nexus-ingest/dataset-tiler/README.md b/nexus-ingest/dataset-tiler/README.md new file mode 100644 index 0000000..043daca --- /dev/null +++ b/nexus-ingest/dataset-tiler/README.md @@ -0,0 +1,11 @@ +# dataset-tiler + +[Spring-XD Module](http://docs.spring.io/spring-xd/docs/current/reference/html/#modules) that creates SectionSpecs that can be used to read the data from the dataset in tiles. + +The project can be built by running + +`./gradlew clean build` + +The module can then be uploaded to Spring XD by running the following command in an XD Shell + +`module upload --type processor --name dataset-tiler --file dataset-tiler/build/libs/dataset-tiler-VERSION.jar` \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-sdap-nexus/blob/ff98fa34/nexus-ingest/dataset-tiler/build.gradle ---------------------------------------------------------------------- diff --git a/nexus-ingest/dataset-tiler/build.gradle b/nexus-ingest/dataset-tiler/build.gradle new file mode 100644 index 0000000..b818457 --- /dev/null +++ b/nexus-ingest/dataset-tiler/build.gradle @@ -0,0 +1,89 @@ +buildscript { + repositories { + maven { + url "http://repo.spring.io/plugins-snapshot" + } + maven { + url 'http://repo.spring.io/plugins-release' + } + maven { + url "http://repo.spring.io/release" + } + maven { + url "http://repo.spring.io/milestone" + } + maven { + url "http://repo.spring.io/snapshot" + } + jcenter() + mavenCentral() + } + dependencies { + classpath("org.springframework.xd:spring-xd-module-plugin:1.3.1.RELEASE") + } +} + +ext { + springXdVersion = '1.3.1.RELEASE' + springIntegrationDslVersion = '1.1.2.RELEASE' + netcdfJavaVersion = '4.6.3' +} + +apply plugin: 'java' +apply plugin: 'groovy' +apply plugin: 'idea' +apply plugin: 'maven' +apply plugin: 'spring-xd-module' +apply plugin: 'project-report' + +group = 'org.nasa.jpl.nexus.ingest' +version = '1.0.0.BUILD-SNAPSHOT' +mainClassName = '' + +sourceCompatibility = 1.8 +targetCompatibility = 1.8 + +repositories { + maven { + url "http://repo.spring.io/release" + } + mavenCentral() + jcenter() + maven { + url "http://repo.spring.io/snapshot" + } + maven { + url "http://repo.spring.io/milestone" + } + maven { + url "https://artifacts.unidata.ucar.edu/content/repositories/unidata-releases/" + } +} + +sourceSets { + main { + groovy { + // override the default locations, rather than adding additional ones + srcDirs = ['src/main/groovy', 'src/main/java'] + } + java { + srcDirs = [] // don't compile Java code twice + } + } +} + +dependencies { + compile("org.springframework.integration:spring-integration-java-dsl:${springIntegrationDslVersion}") + compile "edu.ucar:cdm:${netcdfJavaVersion}" + compile("org.springframework.boot:spring-boot-starter-integration") + compile('org.codehaus.groovy:groovy') + + provided "javax.validation:validation-api" + + testCompile("org.springframework.boot:spring-boot-starter-test") +} + + +task wrapper(type: Wrapper) { + gradleVersion = '2.12' +}
