Kevin W Monroe has proposed merging lp:~bigdata-dev/charms/bundles/apache-hadoop-spark-notebook/trunk into lp:~charmers/charms/bundles/apache-hadoop-spark-notebook/bundle.
Requested reviews: Kevin W Monroe (kwmonroe) Related bugs: Bug #1475634 in Juju Charms Collection: "bigdata solution: need a Apache Hadoop, SPark, and ipython notebook for SPark solution" https://bugs.launchpad.net/charms/+bug/1475634 For more details, see: https://code.launchpad.net/~bigdata-dev/charms/bundles/apache-hadoop-spark-notebook/trunk/+merge/286952 updates from bigdata-dev: - version lock charms in bundle.yaml - update bundle tests - fix README formatting -- Your team Juju Big Data Development is subscribed to branch lp:~bigdata-dev/charms/bundles/apache-hadoop-spark-notebook/trunk.
=== modified file 'README.md' --- README.md 2015-07-09 15:09:34 +0000 +++ README.md 2016-02-23 21:01:00 +0000 @@ -16,81 +16,81 @@ - 1 Notebook (colocated on the Spark unit) - ## Usage - Deploy this bundle using juju-quickstart: - - juju quickstart u/bigdata-dev/apache-hadoop-spark-notebook - - See `juju quickstart --help` for deployment options, including machine - constraints and how to deploy a locally modified version of the - apache-hadoop-spark-notebook bundle.yaml. - - - ## Testing the deployment - - ### Smoke test HDFS admin functionality - Once the deployment is complete and the cluster is running, ssh to the HDFS - Master unit: - - juju ssh hdfs-master/0 - - As the `ubuntu` user, create a temporary directory on the Hadoop file system. - The steps below verify HDFS functionality: - - hdfs dfs -mkdir -p /tmp/hdfs-test - hdfs dfs -chmod -R 777 /tmp/hdfs-test - hdfs dfs -ls /tmp # verify the newly created hdfs-test subdirectory exists - hdfs dfs -rm -R /tmp/hdfs-test - hdfs dfs -ls /tmp # verify the hdfs-test subdirectory has been removed - exit - - ### Smoke test YARN and MapReduce - Run the `terasort.sh` script from the Spark unit to generate and sort data. The - steps below verify that Spark is communicating with the cluster via the plugin - and that YARN and MapReduce are working as expected: - - juju ssh spark/0 - ~/terasort.sh - exit - - ### Smoke test HDFS functionality from user space - From the Spark unit, delete the MapReduce output previously generated by the - `terasort.sh` script: - - juju ssh spark/0 - hdfs dfs -rm -R /user/ubuntu/tera_demo_out - exit - - ### Smoke test Spark - SSH to the Spark unit and run the SparkPi demo as follows: - - juju ssh spark/0 - ~/sparkpi.sh - exit - - ### Access the IPython Notebook web interface - Access the notebook web interface at - http://{spark_unit_ip_address}:8880. The ip address can be found by running - `juju status spark/0 | grep public-address`. - - - ## Scale Out Usage - This bundle was designed to scale out. To increase the amount of Compute - Slaves, you can add units to the compute-slave service. To add one unit: - - juju add-unit compute-slave - - Or you can add multiple units at once: - - juju add-unit -n4 compute-slave - - - ## Contact Information - - - <[email protected]> - - - ## Help - - - [Juju mailing list](https://lists.ubuntu.com/mailman/listinfo/juju) - - [Juju community](https://jujucharms.com/community) +## Usage +Deploy this bundle using juju-quickstart: + + juju quickstart apache-hadoop-spark-notebook + +See `juju quickstart --help` for deployment options, including machine +constraints and how to deploy a locally modified version of the +apache-hadoop-spark-notebook bundle.yaml. + + +## Testing the deployment + +### Smoke test HDFS admin functionality +Once the deployment is complete and the cluster is running, ssh to the HDFS +Master unit: + + juju ssh hdfs-master/0 + +As the `ubuntu` user, create a temporary directory on the Hadoop file system. +The steps below verify HDFS functionality: + + hdfs dfs -mkdir -p /tmp/hdfs-test + hdfs dfs -chmod -R 777 /tmp/hdfs-test + hdfs dfs -ls /tmp # verify the newly created hdfs-test subdirectory exists + hdfs dfs -rm -R /tmp/hdfs-test + hdfs dfs -ls /tmp # verify the hdfs-test subdirectory has been removed + exit + +### Smoke test YARN and MapReduce +Run the `terasort.sh` script from the Spark unit to generate and sort data. The +steps below verify that Spark is communicating with the cluster via the plugin +and that YARN and MapReduce are working as expected: + + juju ssh spark/0 + ~/terasort.sh + exit + +### Smoke test HDFS functionality from user space +From the Spark unit, delete the MapReduce output previously generated by the +`terasort.sh` script: + + juju ssh spark/0 + hdfs dfs -rm -R /user/ubuntu/tera_demo_out + exit + +### Smoke test Spark +SSH to the Spark unit and run the SparkPi demo as follows: + + juju ssh spark/0 + ~/sparkpi.sh + exit + +### Access the IPython Notebook web interface +Access the notebook web interface at +http://{spark_unit_ip_address}:8880. The ip address can be found by running +`juju status spark/0 | grep public-address`. + + +## Scale Out Usage +This bundle was designed to scale out. To increase the amount of Compute +Slaves, you can add units to the compute-slave service. To add one unit: + + juju add-unit compute-slave + +Or you can add multiple units at once: + + juju add-unit -n4 compute-slave + + +## Contact Information + +- <[email protected]> + + +## Help + +- [Juju mailing list](https://lists.ubuntu.com/mailman/listinfo/juju) +- [Juju community](https://jujucharms.com/community) === modified file 'bundle.yaml' --- bundle.yaml 2015-07-16 20:35:31 +0000 +++ bundle.yaml 2016-02-23 21:01:00 +0000 @@ -1,46 +1,46 @@ services: compute-slave: - charm: cs:trusty/apache-hadoop-compute-slave + charm: cs:trusty/apache-hadoop-compute-slave-9 num_units: 3 annotations: gui-x: "300" gui-y: "200" - constraints: mem=3G + constraints: mem=7G hdfs-master: - charm: cs:trusty/apache-hadoop-hdfs-master + charm: cs:trusty/apache-hadoop-hdfs-master-9 num_units: 1 annotations: gui-x: "600" gui-y: "350" constraints: mem=7G plugin: - charm: cs:trusty/apache-hadoop-plugin + charm: cs:trusty/apache-hadoop-plugin-10 annotations: gui-x: "900" gui-y: "200" secondary-namenode: - charm: cs:trusty/apache-hadoop-hdfs-secondary + charm: cs:trusty/apache-hadoop-hdfs-secondary-7 num_units: 1 annotations: gui-x: "600" gui-y: "600" constraints: mem=7G spark: - charm: cs:trusty/apache-spark + charm: cs:trusty/apache-spark-6 num_units: 1 annotations: gui-x: "1200" gui-y: "200" constraints: mem=3G yarn-master: - charm: cs:trusty/apache-hadoop-yarn-master + charm: cs:trusty/apache-hadoop-yarn-master-7 num_units: 1 annotations: gui-x: "600" gui-y: "100" constraints: mem=7G notebook: - charm: cs:trusty/apache-spark-notebook + charm: cs:trusty/apache-spark-notebook-3 annotations: gui-x: "1200" gui-y: "450" @@ -53,4 +53,4 @@ - [plugin, yarn-master] - [plugin, hdfs-master] - [spark, plugin] - - [notebook, spark] + - [spark, notebook] === removed file 'tests/00-setup' --- tests/00-setup 2015-07-16 20:35:31 +0000 +++ tests/00-setup 1970-01-01 00:00:00 +0000 @@ -1,8 +0,0 @@ -#!/bin/bash - -if ! dpkg -s amulet &> /dev/null; then - echo Installing Amulet... - sudo add-apt-repository -y ppa:juju/stable - sudo apt-get update - sudo apt-get -y install amulet -fi === modified file 'tests/01-bundle.py' --- tests/01-bundle.py 2015-07-16 20:35:31 +0000 +++ tests/01-bundle.py 2016-02-23 21:01:00 +0000 @@ -1,61 +1,32 @@ #!/usr/bin/env python3 import os -import time import unittest import yaml import amulet -class Base(object): - """ - Base class for tests for Apache Hadoop Bundle. - """ +class TestBundle(unittest.TestCase): bundle_file = os.path.join(os.path.dirname(__file__), '..', 'bundle.yaml') - profile_name = None @classmethod - def deploy(cls): - # classmethod inheritance doesn't work quite right with - # setUpClass / tearDownClass, so subclasses have to manually call this + def setUpClass(cls): cls.d = amulet.Deployment(series='trusty') with open(cls.bundle_file) as f: bun = f.read() - profiles = yaml.safe_load(bun) - # amulet always selects the first profile, so we have to fudge it here - profile = {cls.profile_name: profiles[cls.profile_name]} - cls.d.load(profile) - cls.d.setup(timeout=9000) - cls.d.sentry.wait() - cls.hdfs = cls.d.sentry.unit['hdfs-master/0'] - cls.yarn = cls.d.sentry.unit['yarn-master/0'] - cls.slave = cls.d.sentry.unit['compute-slave/0'] - cls.secondary = cls.d.sentry.unit['secondary-namenode/0'] - cls.plugin = cls.d.sentry.unit['plugin/0'] - cls.client = cls.d.sentry.unit['client/0'] - - @classmethod - def reset_env(cls): - # classmethod inheritance doesn't work quite right with - # setUpClass / tearDownClass, so subclasses have to manually call this - juju_env = amulet.helpers.default_environment() - services = ['hdfs-master', 'yarn-master', 'compute-slave', 'secondary-namenode', 'plugin', 'client'] - - def check_env_clear(): - state = amulet.waiter.state(juju_env=juju_env) - for service in services: - if state.get(service, {}) != {}: - return False - return True - - for service in services: - cls.d.remove(service) - with amulet.helpers.timeout(300): - while not check_env_clear(): - time.sleep(5) - - def test_hadoop_components(self): + bundle = yaml.safe_load(bun) + cls.d.load(bundle) + cls.d.setup(timeout=1800) + cls.d.sentry.wait_for_messages({'notebook': 'Ready'}, timeout=1800) + cls.hdfs = cls.d.sentry['hdfs-master'][0] + cls.yarn = cls.d.sentry['yarn-master'][0] + cls.slave = cls.d.sentry['compute-slave'][0] + cls.secondary = cls.d.sentry['secondary-namenode'][0] + cls.spark = cls.d.sentry['spark'][0] + cls.notebook = cls.d.sentry['notebook'][0] + + def test_components(self): """ Confirm that all of the required components are up and running. """ @@ -63,17 +34,48 @@ yarn, retcode = self.yarn.run("pgrep -a java") slave, retcode = self.slave.run("pgrep -a java") secondary, retcode = self.secondary.run("pgrep -a java") - client, retcode = self.client.run("pgrep -a java") + spark, retcode = self.spark.run("pgrep -a java") + notebook, retcode = self.spark.run("pgrep -a python") # .NameNode needs the . to differentiate it from SecondaryNameNode assert '.NameNode' in hdfs, "NameNode not started" + assert '.NameNode' not in yarn, "NameNode should not be running on yarn-master" + assert '.NameNode' not in slave, "NameNode should not be running on compute-slave" + assert '.NameNode' not in secondary, "NameNode should not be running on secondary-namenode" + assert '.NameNode' not in spark, "NameNode should not be running on spark" + assert 'ResourceManager' in yarn, "ResourceManager not started" + assert 'ResourceManager' not in hdfs, "ResourceManager should not be running on hdfs-master" + assert 'ResourceManager' not in slave, "ResourceManager should not be running on compute-slave" + assert 'ResourceManager' not in secondary, "ResourceManager should not be running on secondary-namenode" + assert 'ResourceManager' not in spark, "ResourceManager should not be running on spark" + assert 'JobHistoryServer' in yarn, "JobHistoryServer not started" + assert 'JobHistoryServer' not in hdfs, "JobHistoryServer should not be running on hdfs-master" + assert 'JobHistoryServer' not in slave, "JobHistoryServer should not be running on compute-slave" + assert 'JobHistoryServer' not in secondary, "JobHistoryServer should not be running on secondary-namenode" + assert 'JobHistoryServer' not in spark, "JobHistoryServer should not be running on spark" + assert 'NodeManager' in slave, "NodeManager not started" + assert 'NodeManager' not in yarn, "NodeManager should not be running on yarn-master" + assert 'NodeManager' not in hdfs, "NodeManager should not be running on hdfs-master" + assert 'NodeManager' not in secondary, "NodeManager should not be running on secondary-namenode" + assert 'NodeManager' not in spark, "NodeManager should not be running on spark" + assert 'DataNode' in slave, "DataServer not started" + assert 'DataNode' not in yarn, "DataNode should not be running on yarn-master" + assert 'DataNode' not in hdfs, "DataNode should not be running on hdfs-master" + assert 'DataNode' not in secondary, "DataNode should not be running on secondary-namenode" + assert 'DataNode' not in spark, "DataNode should not be running on spark" + assert 'SecondaryNameNode' in secondary, "SecondaryNameNode not started" + assert 'SecondaryNameNode' not in yarn, "SecondaryNameNode should not be running on yarn-master" + assert 'SecondaryNameNode' not in hdfs, "SecondaryNameNode should not be running on hdfs-master" + assert 'SecondaryNameNode' not in slave, "SecondaryNameNode should not be running on compute-slave" + assert 'SecondaryNameNode' not in spark, "SecondaryNameNode should not be running on spark" - return hdfs, yarn, slave, secondary, client # allow subclasses to do additional checks + assert 'spark' in spark, 'Spark should be running on spark' + assert 'notebook' in notebook, 'Notebook should be running on spark' def test_hdfs_dir(self): """ @@ -84,11 +86,11 @@ NB: These are order-dependent, so must be done as part of a single test case. """ - output, retcode = self.client.run("su hdfs -c 'hdfs dfs -mkdir -p /user/ubuntu'") + output, retcode = self.spark.run("su hdfs -c 'hdfs dfs -mkdir -p /user/ubuntu'") assert retcode == 0, "Created a user directory on hdfs FAILED:\n{}".format(output) - output, retcode = self.client.run("su hdfs -c 'hdfs dfs -chown ubuntu:ubuntu /user/ubuntu'") + output, retcode = self.spark.run("su hdfs -c 'hdfs dfs -chown ubuntu:ubuntu /user/ubuntu'") assert retcode == 0, "Assigning an owner to hdfs directory FAILED:\n{}".format(output) - output, retcode = self.client.run("su hdfs -c 'hdfs dfs -chmod -R 755 /user/ubuntu'") + output, retcode = self.spark.run("su hdfs -c 'hdfs dfs -chmod -R 755 /user/ubuntu'") assert retcode == 0, "seting directory permission on hdfs FAILED:\n{}".format(output) def test_yarn_mapreduce_exe(self): @@ -112,59 +114,15 @@ ('cleanup', "su hdfs -c 'hdfs dfs -rm -r /user/ubuntu/teragenout'"), ] for name, step in test_steps: - output, retcode = self.client.run(step) + output, retcode = self.spark.run(step) assert retcode == 0, "{} FAILED:\n{}".format(name, output) - -class TestScalable(unittest.TestCase, Base): - profile_name = 'apache-core-batch-processing' - - @classmethod - def setUpClass(cls): - cls.deploy() - - @classmethod - def tearDownClass(cls): - cls.reset_env() - - def test_hadoop_components(self): - """ - In addition to testing that the components are running where they - are supposed to be, confirm that none of them are also running where - they shouldn't be. - """ - hdfs, yarn, slave, secondary, client = super(TestScalable, self).test_hadoop_components() - - # .NameNode needs the . to differentiate it from SecondaryNameNode - assert '.NameNode' not in yarn, "NameNode should not be running on yarn-master" - assert '.NameNode' not in slave, "NameNode should not be running on compute-slave" - assert '.NameNode' not in secondary, "NameNode should not be running on secondary-namenode" - assert '.NameNode' not in client, "NameNode should not be running on client" - - assert 'ResourceManager' not in hdfs, "ResourceManager should not be running on hdfs-master" - assert 'ResourceManager' not in slave, "ResourceManager should not be running on compute-slave" - assert 'ResourceManager' not in secondary, "ResourceManager should not be running on secondary-namenode" - assert 'ResourceManager' not in client, "ResourceManager should not be running on client" - - assert 'JobHistoryServer' not in hdfs, "JobHistoryServer should not be running on hdfs-master" - assert 'JobHistoryServer' not in slave, "JobHistoryServer should not be running on compute-slave" - assert 'JobHistoryServer' not in secondary, "JobHistoryServer should not be running on secondary-namenode" - assert 'JobHistoryServer' not in client, "JobHistoryServer should not be running on client" - - assert 'NodeManager' not in yarn, "NodeManager should not be running on yarn-master" - assert 'NodeManager' not in hdfs, "NodeManager should not be running on hdfs-master" - assert 'NodeManager' not in secondary, "NodeManager should not be running on secondary-namenode" - assert 'NodeManager' not in client, "NodeManager should not be running on client" - - assert 'DataNode' not in yarn, "DataNode should not be running on yarn-master" - assert 'DataNode' not in hdfs, "DataNode should not be running on hdfs-master" - assert 'DataNode' not in secondary, "DataNode should not be running on secondary-namenode" - assert 'DataNode' not in client, "DataNode should not be running on client" - - assert 'SecondaryNameNode' not in yarn, "SecondaryNameNode should not be running on yarn-master" - assert 'SecondaryNameNode' not in hdfs, "SecondaryNameNode should not be running on hdfs-master" - assert 'SecondaryNameNode' not in slave, "SecondaryNameNode should not be running on compute-slave" - assert 'SecondaryNameNode' not in client, "SecondaryNameNode should not be running on client" + def test_spark(self): + output, retcode = self.spark.run("su ubuntu -c 'bash -lc /home/ubuntu/sparkpi.sh 2>&1'") + assert 'Pi is roughly' in output, 'SparkPI test failed: %s' % output + + def test_notebook(self): + pass # requires javascript; how to test? if __name__ == '__main__': === added file 'tests/tests.yaml' --- tests/tests.yaml 1970-01-01 00:00:00 +0000 +++ tests/tests.yaml 2016-02-23 21:01:00 +0000 @@ -0,0 +1,3 @@ +reset: false +packages: + - amulet
-- Mailing list: https://launchpad.net/~bigdata-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~bigdata-dev More help : https://help.launchpad.net/ListHelp

