bigtop git commit: BIGTOP-2560: Spark charm failing automated tests (closes #154)

kwmonroe Mon, 07 Nov 2016 07:29:11 -0800

Repository: bigtop
Updated Branches:
  refs/heads/master df28f220b -> 7eb710b27



BIGTOP-2560: Spark charm failing automated tests (closes #154)

Signed-off-by: Kevin W Monroe <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/bigtop/repo
Commit: http://git-wip-us.apache.org/repos/asf/bigtop/commit/7eb710b2
Tree: http://git-wip-us.apache.org/repos/asf/bigtop/tree/7eb710b2
Diff: http://git-wip-us.apache.org/repos/asf/bigtop/diff/7eb710b2

Branch: refs/heads/master
Commit: 7eb710b27e7bb4850be0cadf58391b085f2a2f29
Parents: df28f22
Author: Konstantinos Tsakalozos <[email protected]>
Authored: Thu Oct 27 16:17:35 2016 +0300
Committer: Kevin W Monroe <[email protected]>
Committed: Mon Nov 7 09:27:48 2016 -0600

----------------------------------------------------------------------
 .../src/charm/spark/layer-spark/README.md       | 358 ++++++++++---------
 .../src/charm/spark/layer-spark/actions.yaml    |  20 +-
 .../layer-spark/actions/connectedcomponent      |   1 +
 .../spark/layer-spark/actions/decisiontree      |   1 +
 .../src/charm/spark/layer-spark/actions/kmeans  |   1 +
 .../spark/layer-spark/actions/linearregression  |   1 +
 .../src/charm/spark/layer-spark/actions/pca     |   1 +
 .../spark/layer-spark/actions/pregeloperation   |   1 +
 .../spark/layer-spark/actions/shortestpaths     |   1 +
 .../charm/spark/layer-spark/actions/sparkbench  |  77 ++--
 .../charm/spark/layer-spark/actions/streaming   |   1 -
 .../actions/stronglyconnectedcomponent          |   1 +
 .../spark/layer-spark/actions/trianglecount     |   1 -
 .../src/charm/spark/layer-spark/config.yaml     |   4 +-
 .../src/charm/spark/layer-spark/layer.yaml      |   8 +-
 .../lib/charms/layer/bigtop_spark.py            |  84 ++++-
 .../charm/spark/layer-spark/reactive/spark.py   |   7 +-
 .../layer-spark/tests/01-basic-deployment.py    |  23 +-
 .../spark/layer-spark/tests/02-smoke-test.py    |  34 +-
 .../layer-spark/tests/03-scale-standalone.py    |  48 +--
 .../charm/spark/layer-spark/tests/10-test-ha.py |  62 ++--
 21 files changed, 446 insertions(+), 289 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/bigtop/blob/7eb710b2/bigtop-packages/src/charm/spark/layer-spark/README.md
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/README.md 
b/bigtop-packages/src/charm/spark/layer-spark/README.md
index 6de8de6..da88676 100644
--- a/bigtop-packages/src/charm/spark/layer-spark/README.md
+++ b/bigtop-packages/src/charm/spark/layer-spark/README.md
@@ -14,10 +14,11 @@
   See the License for the specific language governing permissions and
   limitations under the License.
 -->
-## Overview
+# Overview
 
-Apache Sparkâ¢ is a fast and general purpose engine for large-scale data
-processing. Key features:
+Apache Spark is a fast and general purpose engine for large-scale data
+processing. This charm deploys the Spark component of the [Apache Bigtop][]
+platform. Key features:
 
  * **Speed**
 
@@ -28,135 +29,206 @@ processing. Key features:
  * **Ease of Use**
 
  Write applications quickly in Java, Scala or Python. Spark offers over 80
- high-level operators that make it easy to build parallel apps, and you can use
- it interactively from the Scala and Python shells.
+ high-level operators that make it easy to build parallel apps for use
+ interactively from the Scala and Python shells.
 
  * **General Purpose Engine**
 
  Combine SQL, streaming, and complex analytics. Spark powers a stack of
  high-level tools including Shark for SQL, MLlib for machine learning, GraphX,
- and Spark Streaming. You can combine these frameworks seamlessly in the same
+ and Spark Streaming. Combine these frameworks seamlessly in the same
  application.
 
+[Apache Bigtop]: http://bigtop.apache.org/
 
-## Deployment
 
-This charm deploys the Spark component of the Apache Bigtop platform and
-supports running Spark in a variety of modes:
+# Deploying
 
- * **Standalone**
+A working Juju installation is assumed to be present. If Juju is not yet set
+up, please follow the [getting-started][] instructions prior to deploying this
+charm.
 
- In this mode Spark units form a cluster that you can scale to match your 
needs.
- Starting with a single node:
+This charm supports running Spark in a variety of modes:
+
+### Standalone
+In this mode, Spark units form a cluster that can be scaled as needed.
+Starting with a single node:
 
     juju deploy spark
-    juju deploy openjdk
-    juju add-relation spark openjdk
 
- You can scale the cluster by adding more spark units:
+Scale the cluster by adding more spark units:
 
     juju add-unit spark
 
- When in standalone mode, Juju ensures a single Spark master is appointed.
- The status of the unit acting as master reads "ready (standalone - master)",
- while the rest of the units display a status of "ready (standalone)".
- If you remove the master, Juju will appoint a new one. However, if a master
- fails in standalone mode, running jobs and job history will be lost.
-
- * **Standalone HA**
+When in standalone mode, Juju ensures a single Spark master is appointed.
+The status of the unit acting as master reads `ready (standalone - master)`,
+while the rest of the units display a status of `ready (standalone)`.
+If the master is removed, Juju will appoint a new one. However, if a master
+fails in standalone mode, running jobs and job history will be lost.
 
- To enable High Availability for a Spark cluster, you need to add Zookeeper to
- the deployment. To ensure a Zookeeper quorum, it is recommended that you
- deploy 3 units of the zookeeper application. For instance:
+### Standalone HA
+To enable High Availability for a Spark cluster, simply add Zookeeper to
+the deployment. To ensure a Zookeeper quorum, 3 units of the zookeeper
+application are recommended. For instance:
 
-    juju deploy apache-zookeeper zookeeper -n 3
+    juju deploy zookeeper -n 3
     juju add-relation spark zookeeper
 
- In this mode, you can again scale your cluster to match your needs by
- adding/removing units. Spark units report "ready (standalone HA)" in their
- status. If you need to identify the node acting as master, query Zookeeper
- as follows:
+In this mode, the cluster can again be scaled as needed by adding/removing
+units. Spark units report `ready (standalone HA)` in their status. To identify
+the unit acting as master, query Zookeeper as follows:
 
     juju run --unit zookeeper/0 'echo "get /spark/master_status" | 
/usr/lib/zookeeper/bin/zkCli.sh'
 
- * **Yarn-client and Yarn-cluster**
+### YARN-client and YARN-cluster
+This charm leverages our pluggable Hadoop model with the `hadoop-plugin`
+interface. This means that this charm can be related to a base Apache Hadoop
+cluster to run Spark jobs there. The suggested deployment method is to use the
+[hadoop-processing][] bundle and add a relation between spark and the plugin.
 
- This charm leverages our pluggable Hadoop model with the `hadoop-plugin`
- interface. This means that you can relate this charm to a base Apache Hadoop 
cluster
- to run Spark jobs there. The suggested deployment method is to use the
- [hadoop-processing](https://jujucharms.com/hadoop-processing/)
- bundle and add a relation between spark and the plugin:
 
     juju deploy hadoop-processing
     juju add-relation plugin spark
 
+> **Note**: The above assumes Juju 2.0 or greater. If using an earlier version
+of Juju, use [juju-quickstart][] with the following syntax: `juju quickstart
+hadoop-processing`.
 
-Note: To switch to a different execution mode, set the
-`spark_execution_mode` config variable:
+To switch among the above execution modes, set the `spark_execution_mode`
+config variable:
 
-    juju set spark spark_execution_mode=<new_mode>
+    juju config spark spark_execution_mode=<new_mode>
 
-See the **Configuration** section below for supported mode options.
+> **Note**: The above assumes Juju 2.0 or greater. If using an earlier version
+of Juju, the syntax is `juju set spark spark_execution_mode=<new_mode>`.
 
+See the **Configuring** section below for supported mode options.
 
-## Usage
+## Network-Restricted Environments
+Charms can be deployed in environments with limited network access. To deploy
+in this environment, configure a Juju model with appropriate proxy and/or
+mirror options. See [Configuring Models][] for more information.
 
-Once deployment is complete, you can manually load and run Spark batch or
-streaming jobs in a variety of ways:
+[getting-started]: https://jujucharms.com/docs/stable/getting-started
+[hadoop-processing]: https://jujucharms.com/hadoop-processing/
+[juju-quickstart]: https://launchpad.net/juju-quickstart
+[Configuring Models]: https://jujucharms.com/docs/stable/models-config
 
-  * **Spark shell**
 
-Sparkâs shell provides a simple way to learn the API, as well as a powerful
-tool to analyse data interactively. It is available in either Scala or Python
-and can be run from the Spark unit as follows:
+# Verifying
 
-       juju ssh spark/0
-       spark-shell # for interaction using scala
-       pyspark     # for interaction using python
+## Status
+Apache Bigtop charms provide extended status reporting to indicate when they
+are ready:
 
-  * **Command line**
+    juju status
 
-SSH to the Spark unit and manually run a spark-submit job, for example:
+This is particularly useful when combined with `watch` to track the on-going
+progress of the deployment:
 
-       juju ssh spark/0
-       spark-submit --class org.apache.spark.examples.SparkPi \
-        --master yarn-client /usr/lib/spark/lib/spark-examples*.jar 10
+    watch -n 2 juju status
 
-  * **Apache Zeppelin visual service**
+The message column will provide information about a given unit's state.
+This charm is ready for use once the status message indicates that it is
+ready.
 
-Deploy Apache Zeppelin and relate it to the Spark unit:
+## Smoke Test
+This charm provides a `smoke-test` action that can be used to verify the
+application is functioning as expected. Run the action as follows:
 
-    juju deploy apache-zeppelin zeppelin
-    juju add-relation spark zeppelin
+    juju run-action spark/0 smoke-test
+
+> **Note**: The above assumes Juju 2.0 or greater. If using an earlier version
+of Juju, the syntax is `juju action do spark/0 smoke-test`.
+
+Watch the progress of the smoke test actions with:
+
+    watch -n 2 juju show-action-status
+
+> **Note**: The above assumes Juju 2.0 or greater. If using an earlier version
+of Juju, the syntax is `juju action status`.
+
+Eventually, the action should settle to `status: completed`.  If it
+reports `status: failed`, the application is not working as expected. Get
+more information about a specific smoke test with:
+
+    juju show-action-output <action-id>
+
+> **Note**: The above assumes Juju 2.0 or greater. If using an earlier version
+of Juju, the syntax is `juju action fetch <action-id>`.
+
+## Spark Master web UI
+Spark provides a web console that can be used to verify information about
+the cluster. To access it, find the `PUBLIC-ADDRESS` of the spark application
+and expose it:
 
-Once the relation has been made, access the web interface at
-`http://{spark_unit_ip_address}:9090`
+    juju status spark
+    juju expose spark
 
-  * **IPyNotebook for Spark**
+The web interface will be available at the following URL:
+
+    http://SPARK_PUBLIC_IP:8080
+
+## Spark Job History web UI
+The Job History server shows all active and finished spark jobs submitted.
+As mentioned above, expose the spark application and note the public IP
+address. The job history web interface will be available at the following URL:
+
+    http://SPARK_PUBLIC_IP:18080
+
+
+# Using
+
+Once deployment is verified, Spark batch or streaming jobs can be run in a
+variety of ways:
+
+### Spark shell
+Spark shell provides a simple way to learn the API, as well as a powerful
+tool to analyze data interactively. It is available in either Scala or Python
+and can be run from the Spark unit as follows:
+
+    juju ssh spark/0
+    spark-shell # for interaction using scala
+    pyspark     # for interaction using python
+
+### Command line
+SSH to the Spark unit and manually run a spark-submit job, for example:
+
+    juju ssh spark/0
+    spark-submit --class org.apache.spark.examples.SparkPi \
+     --master yarn-client /usr/lib/spark/lib/spark-examples*.jar 10
+
+### Apache Zeppelin
+Apache Zeppelin is a web-based notebook that enables interactive data
+analytics. Make beautiful data-driven, interactive, and collaborative documents
+with SQL, Scala and more. Deploy Zeppelin and relate it to Spark:
+
+    juju deploy zeppelin
+    juju add-relation spark zeppelin
 
-The IPython Notebook is an interactive computational environment, in which you
-can combine code execution, rich text, mathematics, plots and rich media.
-Deploy IPython Notebook for Spark and relate it to the Spark unit:
+To access the web console, find the `PUBLIC-ADDRESS` of the zeppelin
+application and expose it:
 
-    juju deploy apache-spark-notebook notebook
-    juju add-relation spark notebook
+    juju status zeppelin
+    juju expose zeppelin
 
-Once the relation has been made, access the web interface at
-`http://{spark_unit_ip_address}:8880`
+The web interface will be available at the following URL:
 
+    http://ZEPPELIN_PUBLIC_IP:9080
 
-## Configuration
 
-### `spark_bench_enabled`
+# Configuring
+
+## spark_bench_enabled
 
 Install the SparkBench benchmarking suite. If `true` (the default), this charm
 will download spark bench from the URL specified by `spark_bench_ppc64le`
 or `spark_bench_x86_64`, depending on the unit's architecture.
 
-### `spark_execution_mode`
+## spark_execution_mode
 
 Spark has four modes of execution: local, standalone, yarn-client, and
-yarn-cluster. The default mode is `yarn-client` and can be changed by setting
+yarn-cluster. The default mode is `standalone` and can be changed by setting
 the `spark_execution_mode` config variable.
 
   * **Local**
@@ -171,12 +243,12 @@ the `spark_execution_mode` config variable.
     * `local[K]`
 
       Run Spark locally with K worker threads (ideally, set this to the number
-      of cores on your machine).
+      of cores on the deployed machine).
 
     * `local[*]`
 
-      Run Spark locally with as many worker threads as logical cores on your
-      machine.
+      Run Spark locally with as many worker threads as logical cores on the
+      deployed machine.
 
   * **Standalone**
 
@@ -186,7 +258,7 @@ the `spark_execution_mode` config variable.
 
   * **YARN-client**
 
-    In `yarn-client` mode, the driver runs in the client process, and the
+    In `yarn-client` mode, the Spark driver runs in the client process, and the
     application master is only used for requesting resources from YARN.
 
   * **YARN-cluster**
@@ -196,118 +268,70 @@ the `spark_execution_mode` config variable.
     after initiating the application.
 
 
-## Verify the deployment
-
-### Status and Smoke Test
-
-The services provide extended status reporting to indicate when they are ready:
-
-    juju status --format=tabular
-
-This is particularly useful when combined with `watch` to track the on-going
-progress of the deployment:
-
-    watch -n 0.5 juju status --format=tabular
-
-The message for each unit will provide information about that unit's state.
-Once they all indicate that they are ready, you can perform a "smoke test"
-to verify that Spark is working as expected using the built-in `smoke-test`
-action:
-
-    juju run-action spark/0 smoke-test
-
-_**Note**: The above assumes Juju 2.0 or greater. If using an earlier version
-of Juju, the syntax is `juju action do spark/0 smoke-test`._
-
-
-After a minute or so, you can check the results of the smoke test:
-
-    juju show-action-status
-
-_**Note**: The above assumes Juju 2.0 or greater. If using an earlier version
-of Juju, the syntax is `juju action status`._
-
-You will see `status: completed` if the smoke test was successful, or
-`status: failed` if it was not.  You can get more information on why it failed
-via:
-
-    juju show-action-output <action-id>
-
-_**Note**: The above assumes Juju 2.0 or greater. If using an earlier version
-of Juju, the syntax is `juju action fetch <action-id>`._
-
-
-### Verify Job History
-
-The Job History server shows all active and finished spark jobs submitted.
-To view the Job History server you need to expose spark (`juju expose spark`)
-and navigate to `http://{spark_master_unit_ip_address}:18080` of the
-unit acting as master.
-
-
-## Benchmarking
-
-This charm provides several benchmarks, including the
-[Spark Bench](https://github.com/SparkTC/spark-bench) benchmarking
-suite (if enabled), to gauge the performance of your environment.
-
-The easiest way to run the benchmarks on this service is to relate it to the
-[Benchmark GUI][].  You will likely also want to relate it to the
-[Benchmark Collector][] to have machine-level information collected during the
-benchmark, for a more complete picture of how the machine performed.
-
-[Benchmark GUI]: https://jujucharms.com/benchmark-gui/
-[Benchmark Collector]: https://jujucharms.com/benchmark-collector/
-
-However, each benchmark is also an action that can be called manually:
-
-    $ juju action do spark/0 pagerank
-    Action queued with id: 88de9367-45a8-4a4b-835b-7660f467a45e
-    $ juju action fetch --wait 0 88de9367-45a8-4a4b-835b-7660f467a45e
+# Benchmarking
+
+This charm provides several benchmarks, including the [Spark Bench][]
+benchmarking suite (if enabled), to gauge the performance of the environment.
+Each benchmark is an action that can be run with `juju run-action`:
+
+    $ juju actions spark | grep Bench
+    connectedcomponent                Run the Spark Bench ConnectedComponent 
benchmark.
+    decisiontree                      Run the Spark Bench DecisionTree 
benchmark.
+    kmeans                            Run the Spark Bench KMeans benchmark.
+    linearregression                  Run the Spark Bench LinearRegression 
benchmark.
+    logisticregression                Run the Spark Bench LogisticRegression 
benchmark.
+    matrixfactorization               Run the Spark Bench MatrixFactorization 
benchmark.
+    pagerank                          Run the Spark Bench PageRank benchmark.
+    pca                               Run the Spark Bench PCA benchmark.
+    pregeloperation                   Run the Spark Bench PregelOperation 
benchmark.
+    shortestpaths                     Run the Spark Bench ShortestPaths 
benchmark.
+    sql                               Run the Spark Bench SQL benchmark.
+    stronglyconnectedcomponent        Run the Spark Bench 
StronglyConnectedComponent benchmark.
+    svdplusplus                       Run the Spark Bench SVDPlusPlus 
benchmark.
+    svm                               Run the Spark Bench SVM benchmark.
+
+    $ juju run-action spark/0 svdplusplus
+    Action queued with id: 339cec1f-e903-4ee7-85ca-876fb0c3d28e
+
+    $ juju show-action-output 339cec1f-e903-4ee7-85ca-876fb0c3d28e
     results:
       meta:
         composite:
           direction: asc
           units: secs
-          value: "77.939000"
+          value: "200.754000"
         raw: |
-          
PageRank,2015-12-10-23:41:57,77.939000,71.888079,.922363,0,PageRank-MLlibConfig,,,,,10,12,,200000,4.0,1.3,0.15
-        start: 2015-12-10T23:41:34Z
-        stop: 2015-12-10T23:43:16Z
+          
SVDPlusPlus,2016-11-02-03:08:26,200.754000,85.974071,.428255,0,SVDPlusPlus-MLlibConfig,,,,,10,,,50000,4.0,1.3,
+        start: 2016-11-02T03:08:26Z
+        stop: 2016-11-02T03:11:47Z
       results:
         duration:
           direction: asc
           units: secs
-          value: "77.939000"
+          value: "200.754000"
         throughput:
           direction: desc
           units: x/sec
-          value: ".922363"
+          value: ".428255"
     status: completed
     timing:
-      completed: 2015-12-10 23:43:59 +0000 UTC
-      enqueued: 2015-12-10 23:42:10 +0000 UTC
-      started: 2015-12-10 23:42:15 +0000 UTC
-
-Valid action names at this time are:
+      completed: 2016-11-02 03:11:48 +0000 UTC
+      enqueued: 2016-11-02 03:08:21 +0000 UTC
+      started: 2016-11-02 03:08:26 +0000 UTC
 
-  * logisticregression
-  * matrixfactorization
-  * pagerank
-  * sql
-  * streaming
-  * svdplusplus
-  * svm
-  * trianglecount
-  * sparkpi
+[Spark Bench]: https://github.com/SparkTC/spark-bench
 
 
-## Contact Information
+# Contact Information
 
 - <[email protected]>
 
 
-## Help
+# Resources
 
+- [Apache Bigtop](http://bigtop.apache.org/) home page
+- [Apache Bigtop issue tracking](http://bigtop.apache.org/issue-tracking.html)
+- [Apache Bigtop mailing lists](http://bigtop.apache.org/mail-lists.html)
+- [Juju Bigtop charms](https://jujucharms.com/q/apache/bigtop)
 - [Juju mailing list](https://lists.ubuntu.com/mailman/listinfo/juju)
 - [Juju community](https://jujucharms.com/community)

http://git-wip-us.apache.org/repos/asf/bigtop/blob/7eb710b2/bigtop-packages/src/charm/spark/layer-spark/actions.yaml
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions.yaml 
b/bigtop-packages/src/charm/spark/layer-spark/actions.yaml
index 869de8f..6564b1c 100644
--- a/bigtop-packages/src/charm/spark/layer-spark/actions.yaml
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions.yaml
@@ -7,22 +7,34 @@ sparkpi:
             description: Number of partitions to use for the SparkPi job
             type: string
             default: "10"
+connectedcomponent:
+    description: Run the Spark Bench ConnectedComponent benchmark.
+decisiontree:
+    description: Run the Spark Bench DecisionTree benchmark.
+kmeans:
+    description: Run the Spark Bench KMeans benchmark.
+linearregression:
+    description: Run the Spark Bench LinearRegression benchmark.
 logisticregression:
     description: Run the Spark Bench LogisticRegression benchmark.
 matrixfactorization:
     description: Run the Spark Bench MatrixFactorization benchmark.
 pagerank:
     description: Run the Spark Bench PageRank benchmark.
+pca:
+    description: Run the Spark Bench PCA benchmark.
+pregeloperation:
+    description: Run the Spark Bench PregelOperation benchmark.
+shortestpaths:
+    description: Run the Spark Bench ShortestPaths benchmark.
 sql:
     description: Run the Spark Bench SQL benchmark.
-streaming:
-    description: Run the Spark Bench Streaming benchmark.
+stronglyconnectedcomponent:
+    description: Run the Spark Bench StronglyConnectedComponent benchmark.
 svdplusplus:
     description: Run the Spark Bench SVDPlusPlus benchmark.
 svm:
     description: Run the Spark Bench SVM benchmark.
-trianglecount:
-    description: Run the Spark Bench TriangleCount benchmark.
 restart-spark-job-history-server:
     description: Restart the Spark job history server.
 start-spark-job-history-server:

http://git-wip-us.apache.org/repos/asf/bigtop/blob/7eb710b2/bigtop-packages/src/charm/spark/layer-spark/actions/connectedcomponent
----------------------------------------------------------------------
diff --git 
a/bigtop-packages/src/charm/spark/layer-spark/actions/connectedcomponent 
b/bigtop-packages/src/charm/spark/layer-spark/actions/connectedcomponent
new file mode 120000
index 0000000..9e15049
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions/connectedcomponent
@@ -0,0 +1 @@
+sparkbench
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/7eb710b2/bigtop-packages/src/charm/spark/layer-spark/actions/decisiontree
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/decisiontree 
b/bigtop-packages/src/charm/spark/layer-spark/actions/decisiontree
new file mode 120000
index 0000000..9e15049
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions/decisiontree
@@ -0,0 +1 @@
+sparkbench
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/7eb710b2/bigtop-packages/src/charm/spark/layer-spark/actions/kmeans
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/kmeans 
b/bigtop-packages/src/charm/spark/layer-spark/actions/kmeans
new file mode 120000
index 0000000..9e15049
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions/kmeans
@@ -0,0 +1 @@
+sparkbench
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/7eb710b2/bigtop-packages/src/charm/spark/layer-spark/actions/linearregression
----------------------------------------------------------------------
diff --git 
a/bigtop-packages/src/charm/spark/layer-spark/actions/linearregression 
b/bigtop-packages/src/charm/spark/layer-spark/actions/linearregression
new file mode 120000
index 0000000..9e15049
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions/linearregression
@@ -0,0 +1 @@
+sparkbench
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/7eb710b2/bigtop-packages/src/charm/spark/layer-spark/actions/pca
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/pca 
b/bigtop-packages/src/charm/spark/layer-spark/actions/pca
new file mode 120000
index 0000000..9e15049
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions/pca
@@ -0,0 +1 @@
+sparkbench
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/7eb710b2/bigtop-packages/src/charm/spark/layer-spark/actions/pregeloperation
----------------------------------------------------------------------
diff --git 
a/bigtop-packages/src/charm/spark/layer-spark/actions/pregeloperation 
b/bigtop-packages/src/charm/spark/layer-spark/actions/pregeloperation
new file mode 120000
index 0000000..9e15049
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions/pregeloperation
@@ -0,0 +1 @@
+sparkbench
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/7eb710b2/bigtop-packages/src/charm/spark/layer-spark/actions/shortestpaths
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/shortestpaths 
b/bigtop-packages/src/charm/spark/layer-spark/actions/shortestpaths
new file mode 120000
index 0000000..9e15049
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions/shortestpaths
@@ -0,0 +1 @@
+sparkbench
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/7eb710b2/bigtop-packages/src/charm/spark/layer-spark/actions/sparkbench
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/sparkbench 
b/bigtop-packages/src/charm/spark/layer-spark/actions/sparkbench
index bc66c70..906a30e 100755
--- a/bigtop-packages/src/charm/spark/layer-spark/actions/sparkbench
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions/sparkbench
@@ -27,6 +27,22 @@ BENCHMARK=`basename $0`
 # Juju actions have an annoying lowercase alphanum restriction, so translate
 # that into the sparkbench name.
 case "${BENCHMARK}" in
+  connectedcomponent)
+    BENCHMARK="ConnectedComponent"
+    RESULT_KEY="ConnectedComponent"
+    ;;
+  decisiontree)
+    BENCHMARK="DecisionTree"
+    RESULT_KEY="DecisionTree"
+    ;;
+  kmeans)
+    BENCHMARK="KMeans"
+    RESULT_KEY="KMeans"
+    ;;
+  linearregression)
+    BENCHMARK="LinearRegression"
+    RESULT_KEY="LinearRegression"
+    ;;
   logisticregression)
     BENCHMARK="LogisticRegression"
     RESULT_KEY="LogisticRegression"
@@ -39,13 +55,25 @@ case "${BENCHMARK}" in
     BENCHMARK="PageRank"
     RESULT_KEY="PageRank"
     ;;
+  pca)
+    BENCHMARK="PCA"
+    RESULT_KEY="PCA"
+    ;;
+  pregeloperation)
+    BENCHMARK="PregelOperation"
+    RESULT_KEY="PregelOperation"
+    ;;
+  shortestpaths)
+    BENCHMARK="ShortestPaths"
+    RESULT_KEY="ShortestPaths"
+    ;;
   sql)
     BENCHMARK="SQL"
     RESULT_KEY="sql"
     ;;
-  streaming)
-    BENCHMARK="Streaming"
-    RESULT_KEY="streaming"
+  stronglyconnectedcomponent)
+    BENCHMARK="StronglyConnectedComponent"
+    RESULT_KEY="StronglyConnectedComponent"
     ;;
   svdplusplus)
     BENCHMARK="SVDPlusPlus"
@@ -55,13 +83,9 @@ case "${BENCHMARK}" in
     BENCHMARK="SVM"
     RESULT_KEY="SVM"
     ;;
-  trianglecount)
-    BENCHMARK="TriangleCount"
-    RESULT_KEY="TriangleCount"
-    ;;
 esac
 
-SB_HOME=/home/ubuntu/spark-bench
+SB_HOME="/home/ubuntu/SparkBench"
 SB_APPS="${SB_HOME}/bin/applications.lst"
 if [ -f "${SB_APPS}" ]; then
   VALID_TEST=`grep -c ^${BENCHMARK} ${SB_HOME}/bin/applications.lst`
@@ -74,22 +98,31 @@ if [ -f "${SB_APPS}" ]; then
     mkdir -p ${RESULT_DIR}
     chown -R ubuntu:ubuntu ${RESULT_DIR}
 
-    # generate data to be used for benchmarking. this must be run as the ubuntu
-    # user to make sure we pick up correct spark environment.
-    echo 'generating data'
-    su ubuntu << EOF
-    . /etc/environment
-    ~/spark-bench/${BENCHMARK}/bin/gen_data.sh &> /dev/null
-EOF
+    # user running the benchmark (spark for local modes; ubuntu for yarn-*)
+    SB_USER="spark"
+
+    # make sure our report file is writable by user + group members
+    SB_REPORT="${SB_HOME}/num/bench-report.dat"
+    if [ -f "${SB_REPORT}" ]; then
+      chmod 664 "${SB_REPORT}"
+    fi
+
+    # Benchmark input data is packed into our sparkbench.tgz, which makes
+    # it available on all spark units. In yarn mode, however, the nodemanagers
+    # act as the spark workers and will not have access to this local data.
+    # In yarn mode, generate our own input data (stored in hdfs) so
+    # nodemanagers can access it.
+    MODE=`config-get spark_execution_mode`
+    if [[ $MODE == "yarn"* ]]; then
+      SB_USER="ubuntu"
+      echo 'generating data'
+      sudo -u ${SB_USER} ${SB_HOME}/${BENCHMARK}/bin/gen_data.sh
+    fi
 
-    # run the benchmark. this must be run as the ubuntu
-    # user to make sure we pick up correct spark environment.
+    # run the benchmark
     echo 'running benchmark'
     benchmark-start
-    su ubuntu << EOF
-    . /etc/environment
-    ~/spark-bench/${BENCHMARK}/bin/run.sh &> /dev/null
-EOF
+    sudo -u ${SB_USER} ${SB_HOME}/${BENCHMARK}/bin/run.sh
     benchmark-finish
 
     # collect our data (the last line in our bench-report.dat file)
@@ -99,7 +132,7 @@ EOF
 
     # send data points and composite score
     benchmark-data 'duration' "${DURATION}" 'secs' 'asc'
-    benchmark-data 'throughput' "${THROUGHPUT}" 'x/sec' 'desc'
+    benchmark-data 'throughput' "${THROUGHPUT}" 'MB/sec' 'desc'
     benchmark-composite "${DURATION}" 'secs' 'asc'
 
     # send raw data (benchmark-raw takes a file)

http://git-wip-us.apache.org/repos/asf/bigtop/blob/7eb710b2/bigtop-packages/src/charm/spark/layer-spark/actions/streaming
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/streaming 
b/bigtop-packages/src/charm/spark/layer-spark/actions/streaming
deleted file mode 120000
index 9e15049..0000000
--- a/bigtop-packages/src/charm/spark/layer-spark/actions/streaming
+++ /dev/null
@@ -1 +0,0 @@
-sparkbench
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/7eb710b2/bigtop-packages/src/charm/spark/layer-spark/actions/stronglyconnectedcomponent
----------------------------------------------------------------------
diff --git 
a/bigtop-packages/src/charm/spark/layer-spark/actions/stronglyconnectedcomponent
 
b/bigtop-packages/src/charm/spark/layer-spark/actions/stronglyconnectedcomponent
new file mode 120000
index 0000000..9e15049
--- /dev/null
+++ 
b/bigtop-packages/src/charm/spark/layer-spark/actions/stronglyconnectedcomponent
@@ -0,0 +1 @@
+sparkbench
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/7eb710b2/bigtop-packages/src/charm/spark/layer-spark/actions/trianglecount
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/trianglecount 
b/bigtop-packages/src/charm/spark/layer-spark/actions/trianglecount
deleted file mode 120000
index 9e15049..0000000
--- a/bigtop-packages/src/charm/spark/layer-spark/actions/trianglecount
+++ /dev/null
@@ -1 +0,0 @@
-sparkbench
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/7eb710b2/bigtop-packages/src/charm/spark/layer-spark/config.yaml
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/config.yaml 
b/bigtop-packages/src/charm/spark/layer-spark/config.yaml
index fb53096..2a88752 100644
--- a/bigtop-packages/src/charm/spark/layer-spark/config.yaml
+++ b/bigtop-packages/src/charm/spark/layer-spark/config.yaml
@@ -16,7 +16,7 @@ options:
             preserved.
     spark_bench_ppc64le:
         type: string
-        default: 
'https://s3.amazonaws.com/jujubigdata/ibm/noarch/spark-bench-2.0-20151214-ffb72f23.tgz#sha256=ffb72f233eaafccef4dda6d4516f23e043d1b14b9d63734211f4d1968db86a3c'
+        default: 
'https://s3.amazonaws.com/jujubigdata/ibm/noarch/SparkBench-2.0-20161101.tgz#sha256=2a34150dc3ad4a1469ca09c202f4db4ee995e2932b8a633d8c006d46c1f61e9f'
         description: |
             URL (including hash) of a ppc64le tarball of SparkBench. By
             default, this points to a pre-built SparkBench binary based on
@@ -24,7 +24,7 @@ options:
             'spark_bench_enabled' is 'true'.
     spark_bench_x86_64:
         type: string
-        default: 
'https://s3.amazonaws.com/jujubigdata/ibm/noarch/spark-bench-2.0-20151214-ffb72f23.tgz#sha256=ffb72f233eaafccef4dda6d4516f23e043d1b14b9d63734211f4d1968db86a3c'
+        default: 
'https://s3.amazonaws.com/jujubigdata/ibm/noarch/SparkBench-2.0-20161101.tgz#sha256=2a34150dc3ad4a1469ca09c202f4db4ee995e2932b8a633d8c006d46c1f61e9f'
         description: |
             URL (including hash) of an x86_64 tarball of SparkBench. By
             default, this points to a pre-built SparkBench binary based on

http://git-wip-us.apache.org/repos/asf/bigtop/blob/7eb710b2/bigtop-packages/src/charm/spark/layer-spark/layer.yaml
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/layer.yaml 
b/bigtop-packages/src/charm/spark/layer-spark/layer.yaml
index b67a34e..11fc135 100644
--- a/bigtop-packages/src/charm/spark/layer-spark/layer.yaml
+++ b/bigtop-packages/src/charm/spark/layer-spark/layer.yaml
@@ -1,4 +1,4 @@
-repo: [email protected]:juju-solutions/layer-apache-bigtop-spark.git
+repo: 
https://github.com/apache/bigtop/tree/master/bigtop-packages/src/charm/spark/layer-spark
 includes:
   - 'layer:apache-bigtop-base'
   - 'layer:hadoop-client'
@@ -8,6 +8,11 @@ includes:
   - 'interface:zookeeper'
   - 'interface:spark-quorum'
 options:
+  basic:
+    # bc and libgfortran3 are needed by spark-bench
+    packages:
+      - 'bc'
+      - 'libgfortran3'
   hadoop-client:
     groups:
         - 'hadoop'
@@ -25,4 +30,3 @@ options:
       spark-webui:
         port: 8080
         exposed_on: 'spark'
-    silent: True

http://git-wip-us.apache.org/repos/asf/bigtop/blob/7eb710b2/bigtop-packages/src/charm/spark/layer-spark/lib/charms/layer/bigtop_spark.py
----------------------------------------------------------------------
diff --git 
a/bigtop-packages/src/charm/spark/layer-spark/lib/charms/layer/bigtop_spark.py 
b/bigtop-packages/src/charm/spark/layer-spark/lib/charms/layer/bigtop_spark.py
index 52f5e97..dc2e373 100755
--- 
a/bigtop-packages/src/charm/spark/layer-spark/lib/charms/layer/bigtop_spark.py
+++ 
b/bigtop-packages/src/charm/spark/layer-spark/lib/charms/layer/bigtop_spark.py
@@ -12,6 +12,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+import os
 from jujubigdata import utils
 from path import Path
 
@@ -59,9 +60,20 @@ class Spark(object):
         return roles
 
     def install_benchmark(self):
+        """
+        Install and configure SparkBench.
+
+        If config[spark_bench_enabled], fetch, install, and configure
+        SparkBench on initial invocation. Subsequent invocations will skip the
+        fetch/install, but will reconfigure SparkBench since we may need to
+        adjust the data dir (eg: benchmark data is stored in hdfs when spark
+        is in yarn mode; locally in all other execution modes).
+        """
         install_sb = hookenv.config()['spark_bench_enabled']
-        sb_dir = '/home/ubuntu/spark-bench'
+        sb_dir = '/home/ubuntu/SparkBench'
         if install_sb:
+            # Fetch/install on our first go-round, then set unit data so we
+            # don't reinstall every time this function is called.
             if not unitdata.kv().get('spark_bench.installed', False):
                 if utils.cpu_arch() == 'ppc64le':
                     sb_url = hookenv.config()['spark_bench_ppc64le']
@@ -73,16 +85,68 @@ class Spark(object):
                 au = ArchiveUrlFetchHandler()
                 au.install(sb_url, '/home/ubuntu')
 
+                # NB: This block is unused when using one of our sb tgzs. It
+                # may come in handy if people want a tgz that does not expand
+                # to our expected sb_dir.
                 # #####
                 # Handle glob if we use a .tgz that doesn't expand to sb_dir
-                # sb_archive_dir = glob('/home/ubuntu/spark-bench-*')[0]
-                # SparkBench expects to live in ~/spark-bench, so put it there
+                # sb_archive_dir = glob('/home/ubuntu/SparkBench*')[0]
+                # SparkBench expects to live in ~/SparkBench, so put it there
                 # Path(sb_archive_dir).rename(sb_dir)
                 # #####
 
+                # Ensure users in the spark group can write to any subdirectory
+                # of sb_dir (spark needs to write benchmark output there when
+                # running in local modes).
+                host.chownr(Path(sb_dir), 'ubuntu', 'spark', chowntopdir=True)
+                for r, d, f in os.walk(sb_dir):
+                    os.chmod(r, 0o2775)
+
                 unitdata.kv().set('spark_bench.installed', True)
                 unitdata.kv().flush(True)
+
+            # Configure the SB env every time this function is called.
+            sb_conf = '{}/conf'.format(sb_dir)
+            sb_env = Path(sb_conf) / 'env.sh'
+            if not sb_env.exists():
+                (Path(sb_conf) / 'env.sh.template').copy(sb_env)
+
+            # NB: A few notes on configuring SparkBench:
+            # 1. Input data has been pregenerated and packed into the tgz. All
+            # spark cluster members will have this data locally, which enables
+            # us to execute benchmarks in the absense of HDFS. When spark is in
+            # yarn mode, we'll need to generate and store this data in HDFS
+            # so nodemanagers can access it (NMs obviously won't have SB
+            # installed locally). Set DATA_HDFS to a local dir or common HDFS
+            # location depending on our spark execution mode.
+            #
+            # 2. SB tries to SSH to spark workers to purge vmem caches. This
+            # isn't possible in containers, nor is it possible in our env
+            # because we don't distribute ssh keys among cluster members.
+            # Set MC_LIST to an empty string to prevent this behavior.
+            #
+            # 3. Throughout SB, HADOOP_HOME/bin is used as the prefix for the
+            # hdfs command. Bigtop's hdfs lives at /usr/bin/hdfs, so set the
+            # SB HADOOP_HOME accordingly (it's not used for anything else).
+            #
+            # 4. Use our MASTER envar to set the SparkBench SPARK_MASTER url.
+            # It is updated every time we (re)configure spark.
+            mode = hookenv.config()['spark_execution_mode']
+            if mode.startswith('yarn'):
+                sb_data_dir = "hdfs:///user/ubuntu/SparkBench"
+            else:
+                sb_data_dir = "file://{}".format(sb_dir)
+
+            utils.re_edit_in_place(sb_env, {
+                r'^DATA_HDFS *=.*': 'DATA_HDFS="{}"'.format(sb_data_dir),
+                r'^DATASET_DIR *=.*': 
'DATASET_DIR="{}/dataset"'.format(sb_dir),
+                r'^MC_LIST *=.*': 'MC_LIST=""',
+                r'.*HADOOP_HOME *=.*': 'HADOOP_HOME="/usr"',
+                r'.*SPARK_HOME *=.*': 'SPARK_HOME="/usr/lib/spark"',
+                r'^SPARK_MASTER *=.*': 'SPARK_MASTER="$MASTER"',
+            })
         else:
+            # config[spark_bench_enabled] is false; remove it
             Path(sb_dir).rmtree_p()
             unitdata.kv().set('spark_bench.installed', False)
             unitdata.kv().flush(True)
@@ -99,6 +163,7 @@ class Spark(object):
         events_dir = dc.path('spark_events')
         events_dir = 'hdfs://{}'.format(events_dir)
         utils.run_as('hdfs', 'hdfs', 'dfs', '-mkdir', '-p', events_dir)
+        utils.run_as('hdfs', 'hdfs', 'dfs', '-chmod', '1777', events_dir)
         utils.run_as('hdfs', 'hdfs', 'dfs', '-chown', '-R', 'ubuntu:spark',
                      events_dir)
         return events_dir
@@ -124,8 +189,6 @@ class Spark(object):
             self.setup()
             unitdata.kv().set('spark.bootstrapped', True)
 
-        self.install_benchmark()
-
         master_ip = 
utils.resolve_private_address(available_hosts['spark-master'])
         hosts = {
             'spark': master_ip,
@@ -172,11 +235,18 @@ class Spark(object):
         if 'namenode' not in available_hosts:
             # Local event dir (not in HDFS) needs to be 777 so non-spark
             # users can write job history there. It needs to be g+s so
-            # spark (in the spark group) can read non-spark user entries.
-            dc.path('spark_events').chmod(0o2777)
+            # all entries will be readable by spark (in the spark group).
+            # It needs to be +t so users cannot remove files they don't own.
+            dc.path('spark_events').chmod(0o3777)
 
         self.patch_worker_master_url(master_ip)
 
+        # SparkBench looks for the spark master in /etc/environment
+        with utils.environment_edit_in_place('/etc/environment') as env:
+            env['MASTER'] = self.get_master_url(master_ip)
+        # Install SB (subsequent calls will reconfigure existing install)
+        self.install_benchmark()
+
     def patch_worker_master_url(self, master_ip):
         '''
         Patch the worker startup script to use the full master url istead of 
contracting it.

http://git-wip-us.apache.org/repos/asf/bigtop/blob/7eb710b2/bigtop-packages/src/charm/spark/layer-spark/reactive/spark.py
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/reactive/spark.py 
b/bigtop-packages/src/charm/spark/layer-spark/reactive/spark.py
index bb36401..99b2101 100644
--- a/bigtop-packages/src/charm/spark/layer-spark/reactive/spark.py
+++ b/bigtop-packages/src/charm/spark/layer-spark/reactive/spark.py
@@ -13,13 +13,12 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 from charms.reactive import RelationBase, when, when_not, is_state, set_state, 
remove_state, when_any
-from charms.layer.apache_bigtop_base import get_fqdn
+from charms.layer.apache_bigtop_base import get_fqdn, get_package_version
 from charms.layer.bigtop_spark import Spark
 from charmhelpers.core import hookenv
 from charms import leadership
 from charms.reactive.helpers import data_changed
 from jujubigdata import utils
-from charms import layer
 
 
 def set_deployment_mode_state(state):
@@ -27,9 +26,11 @@ def set_deployment_mode_state(state):
         remove_state('spark.yarn.installed')
     if is_state('spark.standalone.installed'):
         remove_state('spark.standalone.installed')
-    remove_state('spark.ready.to.install')
     set_state('spark.started')
     set_state(state)
+    # set app version string for juju status output
+    spark_version = get_package_version('spark-core') or 'unknown'
+    hookenv.application_version_set(spark_version)
 
 
 def report_status():

http://git-wip-us.apache.org/repos/asf/bigtop/blob/7eb710b2/bigtop-packages/src/charm/spark/layer-spark/tests/01-basic-deployment.py
----------------------------------------------------------------------
diff --git 
a/bigtop-packages/src/charm/spark/layer-spark/tests/01-basic-deployment.py 
b/bigtop-packages/src/charm/spark/layer-spark/tests/01-basic-deployment.py
index f4022e1..f387c7b 100755
--- a/bigtop-packages/src/charm/spark/layer-spark/tests/01-basic-deployment.py
+++ b/bigtop-packages/src/charm/spark/layer-spark/tests/01-basic-deployment.py
@@ -1,4 +1,5 @@
 #!/usr/bin/env python3
+
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
@@ -14,21 +15,27 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-import unittest
 import amulet
+import unittest
 
 
 class TestDeploy(unittest.TestCase):
     """
-    Trivial deployment test for Apache Spark.
+    Simple deployment test for Apache Bigtop Spark.
     """
+    @classmethod
+    def setUpClass(cls):
+        cls.d = amulet.Deployment(series='xenial')
+        cls.d.add('spark', 'cs:xenial/spark')
+        cls.d.setup(timeout=1800)
+        cls.d.sentry.wait(timeout=1800)
+        cls.unit = cls.d.sentry['spark'][0]
 
-    def test_deploy(self):
-        self.d = amulet.Deployment(series='trusty')
-        self.d.add('spark', 'spark')
-        self.d.setup(timeout=900)
-        self.d.sentry.wait(timeout=1800)
-        self.unit = self.d.sentry['spark'][0]
+    def test_deployed(self):
+        """
+        Validate Spark deployed successfully.
+        """
+        self.assertTrue(self.d.deployed)
 
 
 if __name__ == '__main__':

http://git-wip-us.apache.org/repos/asf/bigtop/blob/7eb710b2/bigtop-packages/src/charm/spark/layer-spark/tests/02-smoke-test.py
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/tests/02-smoke-test.py 
b/bigtop-packages/src/charm/spark/layer-spark/tests/02-smoke-test.py
index bb179a2..4f81cc7 100755
--- a/bigtop-packages/src/charm/spark/layer-spark/tests/02-smoke-test.py
+++ b/bigtop-packages/src/charm/spark/layer-spark/tests/02-smoke-test.py
@@ -1,4 +1,5 @@
 #!/usr/bin/env python3
+
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
@@ -14,31 +15,32 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-import unittest
 import amulet
+import re
+import unittest
 
 
 class TestDeploy(unittest.TestCase):
     """
-    Deployment and smoke-test of Apache Spark.
+    Smoke test for Apache Bigtop Spark.
     """
     @classmethod
     def setUpClass(cls):
-        cls.d = amulet.Deployment(series='trusty')
-        cls.d.add('openjdk', 'openjdk')
-        cls.d.add('spark', 'spark')
-        cls.d.relate('spark:java', 'openjdk:java')
-        cls.d.setup(timeout=900)
-        cls.d.sentry.wait(timeout=1800)
+        cls.d = amulet.Deployment(series='xenial')
+        cls.d.add('spark', 'cs:xenial/spark')
+        cls.d.setup(timeout=1800)
+        cls.d.sentry.wait_for_messages({'spark': re.compile('ready')}, 
timeout=1800)
+        cls.spark = cls.d.sentry['spark'][0]
 
-    def test_deploy(self):
-        self.d.sentry.wait_for_messages({
-            "spark": "ready (standalone - master)",
-        })
-        spark = self.d.sentry['spark'][0]
-        smk_uuid = spark.action_do("smoke-test")
-        output = self.d.action_fetch(smk_uuid, full_output=True)
-        assert "completed" in output['status']
+    def test_spark(self):
+        """
+        Validate Spark by running the smoke-test action.
+        """
+        uuid = self.spark.run_action('smoke-test')
+        result = self.d.action_fetch(uuid, full_output=True)
+        # action status=completed on success
+        if (result['status'] != "completed"):
+            self.fail('Spark smoke-test failed: %s' % result)
 
 
 if __name__ == '__main__':

http://git-wip-us.apache.org/repos/asf/bigtop/blob/7eb710b2/bigtop-packages/src/charm/spark/layer-spark/tests/03-scale-standalone.py
----------------------------------------------------------------------
diff --git 
a/bigtop-packages/src/charm/spark/layer-spark/tests/03-scale-standalone.py 
b/bigtop-packages/src/charm/spark/layer-spark/tests/03-scale-standalone.py
index 38f1290..7d8f4cd 100755
--- a/bigtop-packages/src/charm/spark/layer-spark/tests/03-scale-standalone.py
+++ b/bigtop-packages/src/charm/spark/layer-spark/tests/03-scale-standalone.py
@@ -1,4 +1,5 @@
 #!/usr/bin/env python3
+
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
@@ -14,9 +15,9 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-import unittest
 import amulet
 import time
+import unittest
 
 
 class TestScaleStandalone(unittest.TestCase):
@@ -25,16 +26,15 @@ class TestScaleStandalone(unittest.TestCase):
     """
     @classmethod
     def setUpClass(cls):
-        cls.d = amulet.Deployment(series='trusty')
-        cls.d.add('sparkscale', 'spark', units=3)
-        cls.d.add('openjdk', 'openjdk')
-        cls.d.relate('openjdk:java', 'sparkscale:java')
-        cls.d.setup(timeout=1800)
-        cls.d.sentry.wait(timeout=1800)
+        cls.d = amulet.Deployment(series='xenial')
+        cls.d.add('spark-test-scale', 'cs:xenial/spark', units=3)
+        cls.d.setup(timeout=3600)
+        cls.d.sentry.wait(timeout=3600)
 
-    @classmethod
-    def tearDownClass(cls):
-        cls.d.remove_service('sparkscale')
+    # Disable tearDown until amulet supports it
+    # @classmethod
+    # def tearDownClass(cls):
+    #     cls.d.remove_service('spark-test-scale')
 
     def test_scaleup(self):
         """
@@ -43,16 +43,16 @@ class TestScaleStandalone(unittest.TestCase):
         Check that all units agree on the same new master.
         """
         print("Waiting for units to become ready.")
-        self.d.sentry.wait_for_messages({"sparkscale": ["ready (standalone - 
master)",
-                                                        "ready (standalone)",
-                                                        "ready 
(standalone)"]}, timeout=900)
+        self.d.sentry.wait_for_messages({"spark-test-scale": ["ready 
(standalone - master)",
+                                                              "ready 
(standalone)",
+                                                              "ready 
(standalone)"]}, timeout=900)
 
         print("Waiting for units to agree on master.")
-        time.sleep(60)
+        time.sleep(120)
 
-        spark0_unit = self.d.sentry['sparkscale'][0]
-        spark1_unit = self.d.sentry['sparkscale'][1]
-        spark2_unit = self.d.sentry['sparkscale'][2]
+        spark0_unit = self.d.sentry['spark-test-scale'][0]
+        spark1_unit = self.d.sentry['spark-test-scale'][1]
+        spark2_unit = self.d.sentry['spark-test-scale'][2]
         (stdout0, errcode0) = spark0_unit.run('grep spark.master 
/etc/spark/conf/spark-defaults.conf')
         (stdout1, errcode1) = spark1_unit.run('grep spark.master 
/etc/spark/conf/spark-defaults.conf')
         (stdout2, errcode2) = spark2_unit.run('grep spark.master 
/etc/spark/conf/spark-defaults.conf')
@@ -61,22 +61,22 @@ class TestScaleStandalone(unittest.TestCase):
         assert stdout1 == stdout2
 
         master_name = ''
-        for unit in self.d.sentry['sparkscale']:
+        for unit in self.d.sentry['spark-test-scale']:
             (stdout, stderr) = unit.run("pgrep -f \"[M]aster\"")
             lines = len(stdout.split('\n'))
             if lines > 0:
                 master_name = unit.info['unit_name']
-                print("Killin master {}".format(master_name))
+                print("Killing master {}".format(master_name))
                 self.d.remove_unit(master_name)
                 break
 
         print("Waiting for the cluster to select a new master.")
-        time.sleep(60)
-        self.d.sentry.wait_for_messages({"sparkscale": ["ready (standalone - 
master)",
-                                                        "ready 
(standalone)"]}, timeout=900)
+        time.sleep(120)
+        self.d.sentry.wait_for_messages({"spark-test-scale": ["ready 
(standalone - master)",
+                                                              "ready 
(standalone)"]}, timeout=900)
 
-        spark1_unit = self.d.sentry['sparkscale'][0]
-        spark2_unit = self.d.sentry['sparkscale'][1]
+        spark1_unit = self.d.sentry['spark-test-scale'][0]
+        spark2_unit = self.d.sentry['spark-test-scale'][1]
         (stdout1, errcode1) = spark1_unit.run('grep spark.master 
/etc/spark/conf/spark-defaults.conf')
         (stdout2, errcode2) = spark2_unit.run('grep spark.master 
/etc/spark/conf/spark-defaults.conf')
         # ensure units agree on the master

http://git-wip-us.apache.org/repos/asf/bigtop/blob/7eb710b2/bigtop-packages/src/charm/spark/layer-spark/tests/10-test-ha.py
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/tests/10-test-ha.py 
b/bigtop-packages/src/charm/spark/layer-spark/tests/10-test-ha.py
index 6f5c6ff..f8a09a6 100755
--- a/bigtop-packages/src/charm/spark/layer-spark/tests/10-test-ha.py
+++ b/bigtop-packages/src/charm/spark/layer-spark/tests/10-test-ha.py
@@ -1,4 +1,5 @@
 #!/usr/bin/python3
+
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
@@ -15,49 +16,46 @@
 # limitations under the License.
 
 import amulet
+import requests
 import time
 import unittest
-import requests
 
 
 class TestDeployment(unittest.TestCase):
-
+    """
+    Test scaling of Apache Spark in HA mode.
+    """
     @classmethod
     def setUpClass(cls):
-        cls.d = amulet.Deployment(series='trusty')
-        cls.d.add('sparkha', 'spark', units=3)
-        cls.d.add('openjdk', 'openjdk')
-        cls.d.add('zk', 'zookeeper')
-        cls.d.expose('sparkha')
-        cls.d.relate('openjdk:java', 'sparkha:java')
-        cls.d.relate('zk:zookeeper', 'sparkha:zookeeper')
-        try:
-            cls.d.relate('zk:java', 'openjdk:java')
-        except ValueError:
-            # No need to related older versions of the zookeeper charm
-            # to java.
-            pass
-        cls.d.setup(timeout=1800)
-        cls.d.sentry.wait(timeout=1800)
+        cls.d = amulet.Deployment(series='xenial')
+        cls.d.add('spark-test-ha', 'cs:xenial/spark', units=3)
+        cls.d.add('zk-test', 'cs:xenial/zookeeper')
+        cls.d.relate('zk-test:zookeeper', 'spark-test-ha:zookeeper')
+        cls.d.expose('spark-test-ha')
+        cls.d.setup(timeout=3600)
+        cls.d.sentry.wait(timeout=3600)
 
-    @classmethod
-    def tearDownClass(cls):
-        cls.d.remove_service('sparkha')
+    # Disable tearDown until amulet supports it
+    # @classmethod
+    # def tearDownClass(cls):
+    #     cls.d.remove_service('spark-test-ha')
 
     def test_master_selected(self):
-        '''
-        Wait for all three spark units to agree on a master leader.
+        """
+        Wait for all three spark-test-ha units to agree on a master leader.
         Remove the leader unit.
         Check that a new leader is elected.
-        '''
-        self.d.sentry.wait_for_messages({"sparkha": ["ready (standalone - HA)",
-                                                     "ready (standalone - HA)",
-                                                     "ready (standalone - 
HA)"]}, timeout=900)
-        # Give some slack for the spark units to elect a master
-        time.sleep(60)
+        """
+        self.d.sentry.wait_for_messages({"spark-test-ha": ["ready (standalone 
- HA)",
+                                                           "ready (standalone 
- HA)",
+                                                           "ready (standalone 
- HA)"]}, timeout=900)
+
+        print("Waiting for units to agree on master.")
+        time.sleep(120)
+
         master = ''
         masters_count = 0
-        for unit in self.d.sentry['sparkha']:
+        for unit in self.d.sentry['spark-test-ha']:
             ip = unit.info['public-address']
             url = 'http://{}:8080'.format(ip)
             homepage = requests.get(url)
@@ -73,11 +71,11 @@ class TestDeployment(unittest.TestCase):
         self.d.remove_unit(master)
         time.sleep(120)
 
-        self.d.sentry.wait_for_messages({"sparkha": ["ready (standalone - HA)",
-                                                     "ready (standalone - 
HA)"]}, timeout=900)
+        self.d.sentry.wait_for_messages({"spark-test-ha": ["ready (standalone 
- HA)",
+                                                           "ready (standalone 
- HA)"]}, timeout=900)
 
         masters_count = 0
-        for unit in self.d.sentry['sparkha']:
+        for unit in self.d.sentry['spark-test-ha']:
             ip = unit.info['public-address']
             url = 'http://{}:8080'.format(ip)
             homepage = requests.get(url)

bigtop git commit: BIGTOP-2560: Spark charm failing automated tests (closes #154)

Reply via email to