[2/2] bigtop git commit: BIGTOP-2477: Add Juju charm for spark component

kwmonroe Tue, 04 Oct 2016 16:08:30 -0700

BIGTOP-2477: Add Juju charm for spark component

This closes #117


Signed-off-by: Kevin W Monroe <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/bigtop/repo
Commit: http://git-wip-us.apache.org/repos/asf/bigtop/commit/f5e89f4e
Tree: http://git-wip-us.apache.org/repos/asf/bigtop/tree/f5e89f4e
Diff: http://git-wip-us.apache.org/repos/asf/bigtop/diff/f5e89f4e

Branch: refs/heads/master
Commit: f5e89f4e17487b6893b31e987bf67cea2585a5a6
Parents: f1f5619
Author: Cory Johns <[email protected]>
Authored: Tue May 24 17:49:54 2016 -0400
Committer: Kevin W Monroe <[email protected]>
Committed: Tue Oct 4 18:06:46 2016 -0500

----------------------------------------------------------------------
 .../src/charm/spark/layer-spark/README.md       | 313 +++++++
 .../src/charm/spark/layer-spark/actions.yaml    |  74 ++
 .../charm/spark/layer-spark/actions/list-jobs   |  24 +
 .../layer-spark/actions/logisticregression      |   1 +
 .../layer-spark/actions/matrixfactorization     |   1 +
 .../charm/spark/layer-spark/actions/pagerank    |   1 +
 .../charm/spark/layer-spark/actions/remove-job  |  23 +
 .../actions/restart-spark-job-history-server    |  36 +
 .../charm/spark/layer-spark/actions/smoke-test  |   1 +
 .../charm/spark/layer-spark/actions/sparkbench  | 115 +++
 .../src/charm/spark/layer-spark/actions/sparkpi |  99 +++
 .../src/charm/spark/layer-spark/actions/sql     |   1 +
 .../actions/start-spark-job-history-server      |  36 +
 .../actions/stop-spark-job-history-server       |  36 +
 .../charm/spark/layer-spark/actions/streaming   |   1 +
 .../src/charm/spark/layer-spark/actions/submit  |  57 ++
 .../charm/spark/layer-spark/actions/svdplusplus |   1 +
 .../src/charm/spark/layer-spark/actions/svm     |   1 +
 .../spark/layer-spark/actions/trianglecount     |   1 +
 .../src/charm/spark/layer-spark/config.yaml     |  38 +
 .../src/charm/spark/layer-spark/copyright       |  16 +
 .../src/charm/spark/layer-spark/icon.svg        | 843 +++++++++++++++++++
 .../src/charm/spark/layer-spark/layer.yaml      |  28 +
 .../lib/charms/layer/bigtop_spark.py            | 255 ++++++
 .../src/charm/spark/layer-spark/metadata.yaml   |  19 +
 .../charm/spark/layer-spark/reactive/spark.py   | 171 ++++
 .../charm/spark/layer-spark/scripts/sparkpi.sh  |  20 +
 .../layer-spark/tests/01-basic-deployment.py    |  35 +
 .../spark/layer-spark/tests/02-smoke-test.py    |  45 +
 .../layer-spark/tests/03-scale-standalone.py    |  87 ++
 .../charm/spark/layer-spark/tests/10-test-ha.py |  94 +++
 .../charm/spark/layer-spark/tests/tests.yaml    |   3 +
 .../src/charm/spark/layer-spark/wheelhouse.txt  |   1 +
 33 files changed, 2477 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/README.md
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/README.md 
b/bigtop-packages/src/charm/spark/layer-spark/README.md
new file mode 100644
index 0000000..6de8de6
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/README.md
@@ -0,0 +1,313 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+## Overview
+
+Apache Sparkâ¢ is a fast and general purpose engine for large-scale data
+processing. Key features:
+
+ * **Speed**
+
+ Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster
+ on disk. Spark has an advanced DAG execution engine that supports cyclic data
+ flow and in-memory computing.
+
+ * **Ease of Use**
+
+ Write applications quickly in Java, Scala or Python. Spark offers over 80
+ high-level operators that make it easy to build parallel apps, and you can use
+ it interactively from the Scala and Python shells.
+
+ * **General Purpose Engine**
+
+ Combine SQL, streaming, and complex analytics. Spark powers a stack of
+ high-level tools including Shark for SQL, MLlib for machine learning, GraphX,
+ and Spark Streaming. You can combine these frameworks seamlessly in the same
+ application.
+
+
+## Deployment
+
+This charm deploys the Spark component of the Apache Bigtop platform and
+supports running Spark in a variety of modes:
+
+ * **Standalone**
+
+ In this mode Spark units form a cluster that you can scale to match your 
needs.
+ Starting with a single node:
+
+    juju deploy spark
+    juju deploy openjdk
+    juju add-relation spark openjdk
+
+ You can scale the cluster by adding more spark units:
+
+    juju add-unit spark
+
+ When in standalone mode, Juju ensures a single Spark master is appointed.
+ The status of the unit acting as master reads "ready (standalone - master)",
+ while the rest of the units display a status of "ready (standalone)".
+ If you remove the master, Juju will appoint a new one. However, if a master
+ fails in standalone mode, running jobs and job history will be lost.
+
+ * **Standalone HA**
+
+ To enable High Availability for a Spark cluster, you need to add Zookeeper to
+ the deployment. To ensure a Zookeeper quorum, it is recommended that you
+ deploy 3 units of the zookeeper application. For instance:
+
+    juju deploy apache-zookeeper zookeeper -n 3
+    juju add-relation spark zookeeper
+
+ In this mode, you can again scale your cluster to match your needs by
+ adding/removing units. Spark units report "ready (standalone HA)" in their
+ status. If you need to identify the node acting as master, query Zookeeper
+ as follows:
+
+    juju run --unit zookeeper/0 'echo "get /spark/master_status" | 
/usr/lib/zookeeper/bin/zkCli.sh'
+
+ * **Yarn-client and Yarn-cluster**
+
+ This charm leverages our pluggable Hadoop model with the `hadoop-plugin`
+ interface. This means that you can relate this charm to a base Apache Hadoop 
cluster
+ to run Spark jobs there. The suggested deployment method is to use the
+ [hadoop-processing](https://jujucharms.com/hadoop-processing/)
+ bundle and add a relation between spark and the plugin:
+
+    juju deploy hadoop-processing
+    juju add-relation plugin spark
+
+
+Note: To switch to a different execution mode, set the
+`spark_execution_mode` config variable:
+
+    juju set spark spark_execution_mode=<new_mode>
+
+See the **Configuration** section below for supported mode options.
+
+
+## Usage
+
+Once deployment is complete, you can manually load and run Spark batch or
+streaming jobs in a variety of ways:
+
+  * **Spark shell**
+
+Sparkâs shell provides a simple way to learn the API, as well as a powerful
+tool to analyse data interactively. It is available in either Scala or Python
+and can be run from the Spark unit as follows:
+
+       juju ssh spark/0
+       spark-shell # for interaction using scala
+       pyspark     # for interaction using python
+
+  * **Command line**
+
+SSH to the Spark unit and manually run a spark-submit job, for example:
+
+       juju ssh spark/0
+       spark-submit --class org.apache.spark.examples.SparkPi \
+        --master yarn-client /usr/lib/spark/lib/spark-examples*.jar 10
+
+  * **Apache Zeppelin visual service**
+
+Deploy Apache Zeppelin and relate it to the Spark unit:
+
+    juju deploy apache-zeppelin zeppelin
+    juju add-relation spark zeppelin
+
+Once the relation has been made, access the web interface at
+`http://{spark_unit_ip_address}:9090`
+
+  * **IPyNotebook for Spark**
+
+The IPython Notebook is an interactive computational environment, in which you
+can combine code execution, rich text, mathematics, plots and rich media.
+Deploy IPython Notebook for Spark and relate it to the Spark unit:
+
+    juju deploy apache-spark-notebook notebook
+    juju add-relation spark notebook
+
+Once the relation has been made, access the web interface at
+`http://{spark_unit_ip_address}:8880`
+
+
+## Configuration
+
+### `spark_bench_enabled`
+
+Install the SparkBench benchmarking suite. If `true` (the default), this charm
+will download spark bench from the URL specified by `spark_bench_ppc64le`
+or `spark_bench_x86_64`, depending on the unit's architecture.
+
+### `spark_execution_mode`
+
+Spark has four modes of execution: local, standalone, yarn-client, and
+yarn-cluster. The default mode is `yarn-client` and can be changed by setting
+the `spark_execution_mode` config variable.
+
+  * **Local**
+
+    In Local mode, Spark processes jobs locally without any cluster resources.
+    There are 3 ways to specify 'local' mode:
+
+    * `local`
+
+      Run Spark locally with one worker thread (i.e. no parallelism at all).
+
+    * `local[K]`
+
+      Run Spark locally with K worker threads (ideally, set this to the number
+      of cores on your machine).
+
+    * `local[*]`
+
+      Run Spark locally with as many worker threads as logical cores on your
+      machine.
+
+  * **Standalone**
+
+    In `standalone` mode, Spark launches a Master and Worker daemon on the 
Spark
+    unit. This mode is useful for simulating a distributed cluster environment
+    without actually setting up a cluster.
+
+  * **YARN-client**
+
+    In `yarn-client` mode, the driver runs in the client process, and the
+    application master is only used for requesting resources from YARN.
+
+  * **YARN-cluster**
+
+    In `yarn-cluster` mode, the Spark driver runs inside an application master
+    process which is managed by YARN on the cluster, and the client can go away
+    after initiating the application.
+
+
+## Verify the deployment
+
+### Status and Smoke Test
+
+The services provide extended status reporting to indicate when they are ready:
+
+    juju status --format=tabular
+
+This is particularly useful when combined with `watch` to track the on-going
+progress of the deployment:
+
+    watch -n 0.5 juju status --format=tabular
+
+The message for each unit will provide information about that unit's state.
+Once they all indicate that they are ready, you can perform a "smoke test"
+to verify that Spark is working as expected using the built-in `smoke-test`
+action:
+
+    juju run-action spark/0 smoke-test
+
+_**Note**: The above assumes Juju 2.0 or greater. If using an earlier version
+of Juju, the syntax is `juju action do spark/0 smoke-test`._
+
+
+After a minute or so, you can check the results of the smoke test:
+
+    juju show-action-status
+
+_**Note**: The above assumes Juju 2.0 or greater. If using an earlier version
+of Juju, the syntax is `juju action status`._
+
+You will see `status: completed` if the smoke test was successful, or
+`status: failed` if it was not.  You can get more information on why it failed
+via:
+
+    juju show-action-output <action-id>
+
+_**Note**: The above assumes Juju 2.0 or greater. If using an earlier version
+of Juju, the syntax is `juju action fetch <action-id>`._
+
+
+### Verify Job History
+
+The Job History server shows all active and finished spark jobs submitted.
+To view the Job History server you need to expose spark (`juju expose spark`)
+and navigate to `http://{spark_master_unit_ip_address}:18080` of the
+unit acting as master.
+
+
+## Benchmarking
+
+This charm provides several benchmarks, including the
+[Spark Bench](https://github.com/SparkTC/spark-bench) benchmarking
+suite (if enabled), to gauge the performance of your environment.
+
+The easiest way to run the benchmarks on this service is to relate it to the
+[Benchmark GUI][].  You will likely also want to relate it to the
+[Benchmark Collector][] to have machine-level information collected during the
+benchmark, for a more complete picture of how the machine performed.
+
+[Benchmark GUI]: https://jujucharms.com/benchmark-gui/
+[Benchmark Collector]: https://jujucharms.com/benchmark-collector/
+
+However, each benchmark is also an action that can be called manually:
+
+    $ juju action do spark/0 pagerank
+    Action queued with id: 88de9367-45a8-4a4b-835b-7660f467a45e
+    $ juju action fetch --wait 0 88de9367-45a8-4a4b-835b-7660f467a45e
+    results:
+      meta:
+        composite:
+          direction: asc
+          units: secs
+          value: "77.939000"
+        raw: |
+          
PageRank,2015-12-10-23:41:57,77.939000,71.888079,.922363,0,PageRank-MLlibConfig,,,,,10,12,,200000,4.0,1.3,0.15
+        start: 2015-12-10T23:41:34Z
+        stop: 2015-12-10T23:43:16Z
+      results:
+        duration:
+          direction: asc
+          units: secs
+          value: "77.939000"
+        throughput:
+          direction: desc
+          units: x/sec
+          value: ".922363"
+    status: completed
+    timing:
+      completed: 2015-12-10 23:43:59 +0000 UTC
+      enqueued: 2015-12-10 23:42:10 +0000 UTC
+      started: 2015-12-10 23:42:15 +0000 UTC
+
+Valid action names at this time are:
+
+  * logisticregression
+  * matrixfactorization
+  * pagerank
+  * sql
+  * streaming
+  * svdplusplus
+  * svm
+  * trianglecount
+  * sparkpi
+
+
+## Contact Information
+
+- <[email protected]>
+
+
+## Help
+
+- [Juju mailing list](https://lists.ubuntu.com/mailman/listinfo/juju)
+- [Juju community](https://jujucharms.com/community)

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions.yaml
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions.yaml 
b/bigtop-packages/src/charm/spark/layer-spark/actions.yaml
new file mode 100644
index 0000000..869de8f
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions.yaml
@@ -0,0 +1,74 @@
+smoke-test:
+    description: Verify that Spark is working by calculating pi.
+sparkpi:
+    description: Calculate Pi
+    params:
+        partitions:
+            description: Number of partitions to use for the SparkPi job
+            type: string
+            default: "10"
+logisticregression:
+    description: Run the Spark Bench LogisticRegression benchmark.
+matrixfactorization:
+    description: Run the Spark Bench MatrixFactorization benchmark.
+pagerank:
+    description: Run the Spark Bench PageRank benchmark.
+sql:
+    description: Run the Spark Bench SQL benchmark.
+streaming:
+    description: Run the Spark Bench Streaming benchmark.
+svdplusplus:
+    description: Run the Spark Bench SVDPlusPlus benchmark.
+svm:
+    description: Run the Spark Bench SVM benchmark.
+trianglecount:
+    description: Run the Spark Bench TriangleCount benchmark.
+restart-spark-job-history-server:
+    description: Restart the Spark job history server.
+start-spark-job-history-server:
+    description: Start the Spark job history server.
+stop-spark-job-history-server:
+    description: Stop the Spark job history server.
+submit:
+    description: Submit a job to Spark.
+    required: ['job']
+    params:
+        job:
+            description: >
+                URL to a JAR or Python file.  This can be any URL supported by
+                spark-submit, such as a remote URL, an hdfs:// path (if
+                connected to HDFS), etc.
+            type: string
+        class:
+            description: >
+                If a JAR is given, this should be the name of the class within
+                the JAR to run.
+            type: string
+        job-args:
+            description: Arguments for the job.
+        packages:
+            description: Comma-separated list of packages to include.
+            type: string
+        py-files:
+            description: Comma-separated list of Python packages to include.
+            type: string
+        extra-params:
+            description: >
+                Additional params to pass to spark-submit.
+                For example: "--executor-memory 1000M --supervise"
+            type: string
+        cron:
+            description: >
+                Schedule the job to be run periodically, according to the
+                given cron rule.  For example: "*/5 * * * *" will run the
+                job every 5 minutes.
+            type: string
+list-jobs:
+    description: List scheduled periodic jobs.
+remove-job:
+    description: Remove a job previously scheduled for repeated execution.
+    required: ['action-id']
+    params:
+        action-id:
+            type: string
+            description: The ID returned by the action that scheduled the job.

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/list-jobs
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/list-jobs 
b/bigtop-packages/src/charm/spark/layer-spark/actions/list-jobs
new file mode 100755
index 0000000..b6fdf18
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions/list-jobs
@@ -0,0 +1,24 @@
+#!/bin/bash
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+set -e
+
+for line in "$(crontab -lu ubuntu | grep '# mapreduce job: ')"; do
+    if [[ -n "$line" ]]; then
+        action_id=$(echo "$line" | sed -e 's/.*  # mapreduce job: //')
+        job_code=$(echo "$line" | sed -e 's/  # mapreduce job: .*//')
+        action-set job.$action_id="$job_code"
+    fi
+done

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/logisticregression
----------------------------------------------------------------------
diff --git 
a/bigtop-packages/src/charm/spark/layer-spark/actions/logisticregression 
b/bigtop-packages/src/charm/spark/layer-spark/actions/logisticregression
new file mode 120000
index 0000000..9e15049
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions/logisticregression
@@ -0,0 +1 @@
+sparkbench
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/matrixfactorization
----------------------------------------------------------------------
diff --git 
a/bigtop-packages/src/charm/spark/layer-spark/actions/matrixfactorization 
b/bigtop-packages/src/charm/spark/layer-spark/actions/matrixfactorization
new file mode 120000
index 0000000..9e15049
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions/matrixfactorization
@@ -0,0 +1 @@
+sparkbench
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/pagerank
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/pagerank 
b/bigtop-packages/src/charm/spark/layer-spark/actions/pagerank
new file mode 120000
index 0000000..9e15049
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions/pagerank
@@ -0,0 +1 @@
+sparkbench
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/remove-job
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/remove-job 
b/bigtop-packages/src/charm/spark/layer-spark/actions/remove-job
new file mode 100755
index 0000000..280ca05
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions/remove-job
@@ -0,0 +1,23 @@
+#!/bin/bash
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+set -e
+
+action_id="$(action-get action-id)"
+if crontab -lu ubuntu | grep -q "$action_id"; then
+    crontab -lu ubuntu | grep -v "$action_id" | crontab -u ubuntu -
+else
+    action-fail "Job not found: $action_id"
+fi

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/restart-spark-job-history-server
----------------------------------------------------------------------
diff --git 
a/bigtop-packages/src/charm/spark/layer-spark/actions/restart-spark-job-history-server
 
b/bigtop-packages/src/charm/spark/layer-spark/actions/restart-spark-job-history-server
new file mode 100755
index 0000000..411c335
--- /dev/null
+++ 
b/bigtop-packages/src/charm/spark/layer-spark/actions/restart-spark-job-history-server
@@ -0,0 +1,36 @@
+#!/usr/bin/env python3
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import sys
+
+
+try:
+    from charmhelpers.core import host, hookenv, unitdata
+    from jujubigdata import utils
+    charm_ready = True
+except ImportError:
+    charm_ready = False
+
+if not charm_ready:
+    from subprocess import call
+    call(['action-fail', 'Spark service not yet ready'])
+    sys.exit(1)
+
+if not host.service_available('spark-history'):
+    from subprocess import call
+    call(['action-fail', 'Spark history service not available'])
+    sys.exit(1)
+
+host.service_restart('spark-history-server')

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/smoke-test
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/smoke-test 
b/bigtop-packages/src/charm/spark/layer-spark/actions/smoke-test
new file mode 120000
index 0000000..79ccf46
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions/smoke-test
@@ -0,0 +1 @@
+sparkpi
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/sparkbench
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/sparkbench 
b/bigtop-packages/src/charm/spark/layer-spark/actions/sparkbench
new file mode 100755
index 0000000..bc66c70
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions/sparkbench
@@ -0,0 +1,115 @@
+#!/bin/bash
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+set -ex
+
+if ! charms.reactive is_state 'spark.started'; then
+    action-fail 'Spark not yet ready'
+    exit
+fi
+
+# Do not call this script directly. Call it via one of the symlinks. The
+# symlink name determines the benchmark to run.
+BENCHMARK=`basename $0`
+
+# Juju actions have an annoying lowercase alphanum restriction, so translate
+# that into the sparkbench name.
+case "${BENCHMARK}" in
+  logisticregression)
+    BENCHMARK="LogisticRegression"
+    RESULT_KEY="LogisticRegression"
+    ;;
+  matrixfactorization)
+    BENCHMARK="MatrixFactorization"
+    RESULT_KEY="MF"
+    ;;
+  pagerank)
+    BENCHMARK="PageRank"
+    RESULT_KEY="PageRank"
+    ;;
+  sql)
+    BENCHMARK="SQL"
+    RESULT_KEY="sql"
+    ;;
+  streaming)
+    BENCHMARK="Streaming"
+    RESULT_KEY="streaming"
+    ;;
+  svdplusplus)
+    BENCHMARK="SVDPlusPlus"
+    RESULT_KEY="SVDPlusPlus"
+    ;;
+  svm)
+    BENCHMARK="SVM"
+    RESULT_KEY="SVM"
+    ;;
+  trianglecount)
+    BENCHMARK="TriangleCount"
+    RESULT_KEY="TriangleCount"
+    ;;
+esac
+
+SB_HOME=/home/ubuntu/spark-bench
+SB_APPS="${SB_HOME}/bin/applications.lst"
+if [ -f "${SB_APPS}" ]; then
+  VALID_TEST=`grep -c ^${BENCHMARK} ${SB_HOME}/bin/applications.lst`
+
+  if [ ${VALID_TEST} -gt 0 ]; then
+    # create dir to store results
+    RUN=`date +%s`
+    RESULT_DIR=/opt/sparkbench-results/${BENCHMARK}
+    RESULT_LOG=${RESULT_DIR}/${RUN}.log
+    mkdir -p ${RESULT_DIR}
+    chown -R ubuntu:ubuntu ${RESULT_DIR}
+
+    # generate data to be used for benchmarking. this must be run as the ubuntu
+    # user to make sure we pick up correct spark environment.
+    echo 'generating data'
+    su ubuntu << EOF
+    . /etc/environment
+    ~/spark-bench/${BENCHMARK}/bin/gen_data.sh &> /dev/null
+EOF
+
+    # run the benchmark. this must be run as the ubuntu
+    # user to make sure we pick up correct spark environment.
+    echo 'running benchmark'
+    benchmark-start
+    su ubuntu << EOF
+    . /etc/environment
+    ~/spark-bench/${BENCHMARK}/bin/run.sh &> /dev/null
+EOF
+    benchmark-finish
+
+    # collect our data (the last line in our bench-report.dat file)
+    DATA=`grep ${RESULT_KEY} ${SB_HOME}/num/bench-report.dat | tail -1`
+    DURATION=`echo ${DATA} | awk -F, '{print $3}'`
+    THROUGHPUT=`echo ${DATA} | awk -F, '{print $5}'`
+
+    # send data points and composite score
+    benchmark-data 'duration' "${DURATION}" 'secs' 'asc'
+    benchmark-data 'throughput' "${THROUGHPUT}" 'x/sec' 'desc'
+    benchmark-composite "${DURATION}" 'secs' 'asc'
+
+    # send raw data (benchmark-raw takes a file)
+    echo ${DATA} > ${RESULT_LOG}
+    benchmark-raw ${RESULT_LOG}
+  else
+    echo "ERROR: Invalid benchmark (${BENCHMARK})"
+    exit 1
+  fi
+else
+  echo "ERROR: Could not find SparkBench application list"
+  exit 1
+fi

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/sparkpi
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/sparkpi 
b/bigtop-packages/src/charm/spark/layer-spark/actions/sparkpi
new file mode 100755
index 0000000..9afceaf
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions/sparkpi
@@ -0,0 +1,99 @@
+#!/usr/bin/env python3
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import sys
+sys.path.append('lib')
+
+from path import Path
+from time import time
+import subprocess
+
+from charmhelpers.contrib.benchmark import Benchmark
+from charmhelpers.core import hookenv
+from charms.reactive import is_state
+
+
+def fail(msg, output):
+    hookenv.action_set({'output': output})
+    hookenv.action_fail(msg)
+    sys.exit(1)
+
+
+def main():
+    bench = Benchmark()
+
+    if not is_state('spark.started'):
+        fail('Spark not yet ready', 'error')
+
+    num_partitions = hookenv.action_get('partitions') or ''
+
+    # create dir to store results
+    run = int(time())
+    result_dir = Path('/opt/sparkpi-results')
+    result_log = result_dir / '{}.log'.format(run)
+    if not result_dir.exists():
+        result_dir.mkdir()
+    result_dir.chown('ubuntu', 'ubuntu')
+
+    bench.start()
+    start = int(time())
+
+    hookenv.log("values: {} {}".format(num_partitions, result_log))
+
+    print('calculating pi')
+
+    with open(result_log, 'w') as log_file:
+        arg_list = [
+            'spark-submit',
+            '--class',
+            'org.apache.spark.examples.SparkPi',
+            '/usr/lib/spark/lib/spark-examples.jar'
+        ]
+        if num_partitions:
+            # This is always blank. TODO: figure out what it was
+            # supposed to do.
+            arg_list.append(num_partitions)
+
+        try:
+            subprocess.check_call(arg_list, stdout=log_file,
+                                  stderr=subprocess.STDOUT)
+        except subprocess.CalledProcessError as e:
+            print('smoke test command failed: ')
+            print('{}'.format(' '.join(arg_list)))
+            fail('spark-submit failed: {}'.format(e), 'error')
+
+    stop = int(time())
+    bench.finish()
+
+    duration = stop - start
+    bench.set_composite_score(duration, 'secs')
+    subprocess.check_call(['benchmark-raw', result_log])
+
+    with open(result_log) as log:
+        success = False
+        for line in log.readlines():
+            if 'Pi is roughly 3.1' in line:
+                success = True
+                break
+
+    if not success:
+        fail('spark-submit did not calculate pi', 'error')
+
+    hookenv.action_set({'output': {'status': 'completed'}})
+
+
+if __name__ == '__main__':
+    main()

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/sql
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/sql 
b/bigtop-packages/src/charm/spark/layer-spark/actions/sql
new file mode 120000
index 0000000..9e15049
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions/sql
@@ -0,0 +1 @@
+sparkbench
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/start-spark-job-history-server
----------------------------------------------------------------------
diff --git 
a/bigtop-packages/src/charm/spark/layer-spark/actions/start-spark-job-history-server
 
b/bigtop-packages/src/charm/spark/layer-spark/actions/start-spark-job-history-server
new file mode 100755
index 0000000..9677a38
--- /dev/null
+++ 
b/bigtop-packages/src/charm/spark/layer-spark/actions/start-spark-job-history-server
@@ -0,0 +1,36 @@
+#!/usr/bin/env python3
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import sys
+
+
+try:
+    from charmhelpers.core import host, hookenv, unitdata
+    from jujubigdata import utils
+    charm_ready = True
+except ImportError:
+    charm_ready = False
+
+if not charm_ready:
+    from subprocess import call
+    call(['action-fail', 'Spark service not yet ready'])
+    sys.exit(1)
+
+if not host.service_available('spark-history-server'):
+    from subprocess import call
+    call(['action-fail', 'Spark history service not available'])
+    sys.exit(1)
+
+host.service_start('spark-history-server')

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/stop-spark-job-history-server
----------------------------------------------------------------------
diff --git 
a/bigtop-packages/src/charm/spark/layer-spark/actions/stop-spark-job-history-server
 
b/bigtop-packages/src/charm/spark/layer-spark/actions/stop-spark-job-history-server
new file mode 100755
index 0000000..fbe41cf
--- /dev/null
+++ 
b/bigtop-packages/src/charm/spark/layer-spark/actions/stop-spark-job-history-server
@@ -0,0 +1,36 @@
+#!/usr/bin/env python3
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import sys
+
+
+try:
+    from charmhelpers.core import host, hookenv, unitdata
+    from jujubigdata import utils
+    charm_ready = True
+except ImportError:
+    charm_ready = False
+
+if not charm_ready:
+    from subprocess import call
+    call(['action-fail', 'Spark service not yet ready'])
+    sys.exit(1)
+
+if not host.service_available('spark-history-server'):
+    from subprocess import call
+    call(['action-fail', 'Spark history service not available'])
+    sys.exit(1)
+
+host.service_stop('spark-history-server')

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/streaming
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/streaming 
b/bigtop-packages/src/charm/spark/layer-spark/actions/streaming
new file mode 120000
index 0000000..9e15049
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions/streaming
@@ -0,0 +1 @@
+sparkbench
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/submit
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/submit 
b/bigtop-packages/src/charm/spark/layer-spark/actions/submit
new file mode 100755
index 0000000..a25a7af
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions/submit
@@ -0,0 +1,57 @@
+#!/bin/bash
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+set -e
+
+if ! charms.reactive is_state 'spark.started'; then
+    action-fail 'Spark not yet ready'
+    exit
+fi
+
+py_files="$(action-get py-files)"
+packages="$(action-get packages)"
+extra_params="$(action-get extra-params)"
+class="$(action-get class)"
+job="$(action-get job)"
+job_args="$(action-get job-args)"
+cron="$(action-get cron)"
+
+submit_args='--deploy-mode cluster'
+if [[ -n "$packages" ]]; then
+    submit_args="$submit_args --packages $packages"
+fi
+if [[ -n "$py_files" ]]; then
+    submit_args="$submit_args --py-files $py_files"
+fi
+if [[ -n "$extra_params" ]]; then
+    submit_args="$submit_args $extra_params"
+fi
+if [[ -n "$class" ]]; then
+    submit_args="$submit_args --class $class"
+fi
+submit_args="$submit_args $job"
+
+job_code=". /etc/environment ; spark-submit ${submit_args} ${job_args}"
+action-set job-code="$job_code"
+
+if [[ -z "$cron" ]]; then
+    su ubuntu -c "$job_code"
+else
+    juju-log "Scheduling job with ID $JUJU_ACTION_UUID"
+    action-set action-id="$JUJU_ACTION_UUID"
+    job_line="$cron $job_code # $JUJU_ACTION_UUID"
+    crontab -lu ubuntu > /dev/null || echo -n | crontab -u ubuntu -
+    (crontab -lu ubuntu; echo "$job_line") | crontab -u ubuntu -
+fi

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/svdplusplus
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/svdplusplus 
b/bigtop-packages/src/charm/spark/layer-spark/actions/svdplusplus
new file mode 120000
index 0000000..9e15049
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions/svdplusplus
@@ -0,0 +1 @@
+sparkbench
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/svm
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/svm 
b/bigtop-packages/src/charm/spark/layer-spark/actions/svm
new file mode 120000
index 0000000..9e15049
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions/svm
@@ -0,0 +1 @@
+sparkbench
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/trianglecount
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/trianglecount 
b/bigtop-packages/src/charm/spark/layer-spark/actions/trianglecount
new file mode 120000
index 0000000..9e15049
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/actions/trianglecount
@@ -0,0 +1 @@
+sparkbench
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/config.yaml
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/config.yaml 
b/bigtop-packages/src/charm/spark/layer-spark/config.yaml
new file mode 100644
index 0000000..fb53096
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/config.yaml
@@ -0,0 +1,38 @@
+options:
+    resources_mirror:
+        type: string
+        default: ''
+        description: |
+            URL used to fetch resources (e.g., Hadoop binaries) instead of the
+            location specified in resources.yaml.
+    spark_bench_enabled:
+        type: boolean
+        default: true
+        description: |
+            When set to 'true' (the default), this charm will download and
+            install the SparkBench benchmark suite from the configured URLs.
+            When set to 'false', SparkBench will be removed from the unit,
+            though any data stored in hdfs:///user/ubuntu/spark-bench will be
+            preserved.
+    spark_bench_ppc64le:
+        type: string
+        default: 
'https://s3.amazonaws.com/jujubigdata/ibm/noarch/spark-bench-2.0-20151214-ffb72f23.tgz#sha256=ffb72f233eaafccef4dda6d4516f23e043d1b14b9d63734211f4d1968db86a3c'
+        description: |
+            URL (including hash) of a ppc64le tarball of SparkBench. By
+            default, this points to a pre-built SparkBench binary based on
+            sources in the upstream repository. This option is only valid when
+            'spark_bench_enabled' is 'true'.
+    spark_bench_x86_64:
+        type: string
+        default: 
'https://s3.amazonaws.com/jujubigdata/ibm/noarch/spark-bench-2.0-20151214-ffb72f23.tgz#sha256=ffb72f233eaafccef4dda6d4516f23e043d1b14b9d63734211f4d1968db86a3c'
+        description: |
+            URL (including hash) of an x86_64 tarball of SparkBench. By
+            default, this points to a pre-built SparkBench binary based on
+            sources in the upstream repository. This option is only valid when
+            'spark_bench_enabled' is 'true'.
+    spark_execution_mode:
+        type: string
+        default: 'standalone'
+        description: |
+            Options are "local", "standalone", "yarn-client", and
+            "yarn-cluster". Consult the readme for details on these options.

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/copyright
----------------------------------------------------------------------
diff --git a/bigtop-packages/src/charm/spark/layer-spark/copyright 
b/bigtop-packages/src/charm/spark/layer-spark/copyright
new file mode 100644
index 0000000..52de50a
--- /dev/null
+++ b/bigtop-packages/src/charm/spark/layer-spark/copyright
@@ -0,0 +1,16 @@
+Format: http://dep.debian.net/deps/dep5/
+
+Files: *
+Copyright: Copyright 2015, Canonical Ltd., All Rights Reserved, The Apache 
Software Foundation
+License: Apache License 2.0
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+ .
+     http://www.apache.org/licenses/LICENSE-2.0
+ .
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.

[2/2] bigtop git commit: BIGTOP-2477: Add Juju charm for spark component

Reply via email to