BIGTOP-2477: Add Juju charm for spark component This closes #117
Signed-off-by: Kevin W Monroe <[email protected]> Project: http://git-wip-us.apache.org/repos/asf/bigtop/repo Commit: http://git-wip-us.apache.org/repos/asf/bigtop/commit/f5e89f4e Tree: http://git-wip-us.apache.org/repos/asf/bigtop/tree/f5e89f4e Diff: http://git-wip-us.apache.org/repos/asf/bigtop/diff/f5e89f4e Branch: refs/heads/master Commit: f5e89f4e17487b6893b31e987bf67cea2585a5a6 Parents: f1f5619 Author: Cory Johns <[email protected]> Authored: Tue May 24 17:49:54 2016 -0400 Committer: Kevin W Monroe <[email protected]> Committed: Tue Oct 4 18:06:46 2016 -0500 ---------------------------------------------------------------------- .../src/charm/spark/layer-spark/README.md | 313 +++++++ .../src/charm/spark/layer-spark/actions.yaml | 74 ++ .../charm/spark/layer-spark/actions/list-jobs | 24 + .../layer-spark/actions/logisticregression | 1 + .../layer-spark/actions/matrixfactorization | 1 + .../charm/spark/layer-spark/actions/pagerank | 1 + .../charm/spark/layer-spark/actions/remove-job | 23 + .../actions/restart-spark-job-history-server | 36 + .../charm/spark/layer-spark/actions/smoke-test | 1 + .../charm/spark/layer-spark/actions/sparkbench | 115 +++ .../src/charm/spark/layer-spark/actions/sparkpi | 99 +++ .../src/charm/spark/layer-spark/actions/sql | 1 + .../actions/start-spark-job-history-server | 36 + .../actions/stop-spark-job-history-server | 36 + .../charm/spark/layer-spark/actions/streaming | 1 + .../src/charm/spark/layer-spark/actions/submit | 57 ++ .../charm/spark/layer-spark/actions/svdplusplus | 1 + .../src/charm/spark/layer-spark/actions/svm | 1 + .../spark/layer-spark/actions/trianglecount | 1 + .../src/charm/spark/layer-spark/config.yaml | 38 + .../src/charm/spark/layer-spark/copyright | 16 + .../src/charm/spark/layer-spark/icon.svg | 843 +++++++++++++++++++ .../src/charm/spark/layer-spark/layer.yaml | 28 + .../lib/charms/layer/bigtop_spark.py | 255 ++++++ .../src/charm/spark/layer-spark/metadata.yaml | 19 + .../charm/spark/layer-spark/reactive/spark.py | 171 ++++ .../charm/spark/layer-spark/scripts/sparkpi.sh | 20 + .../layer-spark/tests/01-basic-deployment.py | 35 + .../spark/layer-spark/tests/02-smoke-test.py | 45 + .../layer-spark/tests/03-scale-standalone.py | 87 ++ .../charm/spark/layer-spark/tests/10-test-ha.py | 94 +++ .../charm/spark/layer-spark/tests/tests.yaml | 3 + .../src/charm/spark/layer-spark/wheelhouse.txt | 1 + 33 files changed, 2477 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/README.md ---------------------------------------------------------------------- diff --git a/bigtop-packages/src/charm/spark/layer-spark/README.md b/bigtop-packages/src/charm/spark/layer-spark/README.md new file mode 100644 index 0000000..6de8de6 --- /dev/null +++ b/bigtop-packages/src/charm/spark/layer-spark/README.md @@ -0,0 +1,313 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> +## Overview + +Apache Spark⢠is a fast and general purpose engine for large-scale data +processing. Key features: + + * **Speed** + + Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster + on disk. Spark has an advanced DAG execution engine that supports cyclic data + flow and in-memory computing. + + * **Ease of Use** + + Write applications quickly in Java, Scala or Python. Spark offers over 80 + high-level operators that make it easy to build parallel apps, and you can use + it interactively from the Scala and Python shells. + + * **General Purpose Engine** + + Combine SQL, streaming, and complex analytics. Spark powers a stack of + high-level tools including Shark for SQL, MLlib for machine learning, GraphX, + and Spark Streaming. You can combine these frameworks seamlessly in the same + application. + + +## Deployment + +This charm deploys the Spark component of the Apache Bigtop platform and +supports running Spark in a variety of modes: + + * **Standalone** + + In this mode Spark units form a cluster that you can scale to match your needs. + Starting with a single node: + + juju deploy spark + juju deploy openjdk + juju add-relation spark openjdk + + You can scale the cluster by adding more spark units: + + juju add-unit spark + + When in standalone mode, Juju ensures a single Spark master is appointed. + The status of the unit acting as master reads "ready (standalone - master)", + while the rest of the units display a status of "ready (standalone)". + If you remove the master, Juju will appoint a new one. However, if a master + fails in standalone mode, running jobs and job history will be lost. + + * **Standalone HA** + + To enable High Availability for a Spark cluster, you need to add Zookeeper to + the deployment. To ensure a Zookeeper quorum, it is recommended that you + deploy 3 units of the zookeeper application. For instance: + + juju deploy apache-zookeeper zookeeper -n 3 + juju add-relation spark zookeeper + + In this mode, you can again scale your cluster to match your needs by + adding/removing units. Spark units report "ready (standalone HA)" in their + status. If you need to identify the node acting as master, query Zookeeper + as follows: + + juju run --unit zookeeper/0 'echo "get /spark/master_status" | /usr/lib/zookeeper/bin/zkCli.sh' + + * **Yarn-client and Yarn-cluster** + + This charm leverages our pluggable Hadoop model with the `hadoop-plugin` + interface. This means that you can relate this charm to a base Apache Hadoop cluster + to run Spark jobs there. The suggested deployment method is to use the + [hadoop-processing](https://jujucharms.com/hadoop-processing/) + bundle and add a relation between spark and the plugin: + + juju deploy hadoop-processing + juju add-relation plugin spark + + +Note: To switch to a different execution mode, set the +`spark_execution_mode` config variable: + + juju set spark spark_execution_mode=<new_mode> + +See the **Configuration** section below for supported mode options. + + +## Usage + +Once deployment is complete, you can manually load and run Spark batch or +streaming jobs in a variety of ways: + + * **Spark shell** + +Sparkâs shell provides a simple way to learn the API, as well as a powerful +tool to analyse data interactively. It is available in either Scala or Python +and can be run from the Spark unit as follows: + + juju ssh spark/0 + spark-shell # for interaction using scala + pyspark # for interaction using python + + * **Command line** + +SSH to the Spark unit and manually run a spark-submit job, for example: + + juju ssh spark/0 + spark-submit --class org.apache.spark.examples.SparkPi \ + --master yarn-client /usr/lib/spark/lib/spark-examples*.jar 10 + + * **Apache Zeppelin visual service** + +Deploy Apache Zeppelin and relate it to the Spark unit: + + juju deploy apache-zeppelin zeppelin + juju add-relation spark zeppelin + +Once the relation has been made, access the web interface at +`http://{spark_unit_ip_address}:9090` + + * **IPyNotebook for Spark** + +The IPython Notebook is an interactive computational environment, in which you +can combine code execution, rich text, mathematics, plots and rich media. +Deploy IPython Notebook for Spark and relate it to the Spark unit: + + juju deploy apache-spark-notebook notebook + juju add-relation spark notebook + +Once the relation has been made, access the web interface at +`http://{spark_unit_ip_address}:8880` + + +## Configuration + +### `spark_bench_enabled` + +Install the SparkBench benchmarking suite. If `true` (the default), this charm +will download spark bench from the URL specified by `spark_bench_ppc64le` +or `spark_bench_x86_64`, depending on the unit's architecture. + +### `spark_execution_mode` + +Spark has four modes of execution: local, standalone, yarn-client, and +yarn-cluster. The default mode is `yarn-client` and can be changed by setting +the `spark_execution_mode` config variable. + + * **Local** + + In Local mode, Spark processes jobs locally without any cluster resources. + There are 3 ways to specify 'local' mode: + + * `local` + + Run Spark locally with one worker thread (i.e. no parallelism at all). + + * `local[K]` + + Run Spark locally with K worker threads (ideally, set this to the number + of cores on your machine). + + * `local[*]` + + Run Spark locally with as many worker threads as logical cores on your + machine. + + * **Standalone** + + In `standalone` mode, Spark launches a Master and Worker daemon on the Spark + unit. This mode is useful for simulating a distributed cluster environment + without actually setting up a cluster. + + * **YARN-client** + + In `yarn-client` mode, the driver runs in the client process, and the + application master is only used for requesting resources from YARN. + + * **YARN-cluster** + + In `yarn-cluster` mode, the Spark driver runs inside an application master + process which is managed by YARN on the cluster, and the client can go away + after initiating the application. + + +## Verify the deployment + +### Status and Smoke Test + +The services provide extended status reporting to indicate when they are ready: + + juju status --format=tabular + +This is particularly useful when combined with `watch` to track the on-going +progress of the deployment: + + watch -n 0.5 juju status --format=tabular + +The message for each unit will provide information about that unit's state. +Once they all indicate that they are ready, you can perform a "smoke test" +to verify that Spark is working as expected using the built-in `smoke-test` +action: + + juju run-action spark/0 smoke-test + +_**Note**: The above assumes Juju 2.0 or greater. If using an earlier version +of Juju, the syntax is `juju action do spark/0 smoke-test`._ + + +After a minute or so, you can check the results of the smoke test: + + juju show-action-status + +_**Note**: The above assumes Juju 2.0 or greater. If using an earlier version +of Juju, the syntax is `juju action status`._ + +You will see `status: completed` if the smoke test was successful, or +`status: failed` if it was not. You can get more information on why it failed +via: + + juju show-action-output <action-id> + +_**Note**: The above assumes Juju 2.0 or greater. If using an earlier version +of Juju, the syntax is `juju action fetch <action-id>`._ + + +### Verify Job History + +The Job History server shows all active and finished spark jobs submitted. +To view the Job History server you need to expose spark (`juju expose spark`) +and navigate to `http://{spark_master_unit_ip_address}:18080` of the +unit acting as master. + + +## Benchmarking + +This charm provides several benchmarks, including the +[Spark Bench](https://github.com/SparkTC/spark-bench) benchmarking +suite (if enabled), to gauge the performance of your environment. + +The easiest way to run the benchmarks on this service is to relate it to the +[Benchmark GUI][]. You will likely also want to relate it to the +[Benchmark Collector][] to have machine-level information collected during the +benchmark, for a more complete picture of how the machine performed. + +[Benchmark GUI]: https://jujucharms.com/benchmark-gui/ +[Benchmark Collector]: https://jujucharms.com/benchmark-collector/ + +However, each benchmark is also an action that can be called manually: + + $ juju action do spark/0 pagerank + Action queued with id: 88de9367-45a8-4a4b-835b-7660f467a45e + $ juju action fetch --wait 0 88de9367-45a8-4a4b-835b-7660f467a45e + results: + meta: + composite: + direction: asc + units: secs + value: "77.939000" + raw: | + PageRank,2015-12-10-23:41:57,77.939000,71.888079,.922363,0,PageRank-MLlibConfig,,,,,10,12,,200000,4.0,1.3,0.15 + start: 2015-12-10T23:41:34Z + stop: 2015-12-10T23:43:16Z + results: + duration: + direction: asc + units: secs + value: "77.939000" + throughput: + direction: desc + units: x/sec + value: ".922363" + status: completed + timing: + completed: 2015-12-10 23:43:59 +0000 UTC + enqueued: 2015-12-10 23:42:10 +0000 UTC + started: 2015-12-10 23:42:15 +0000 UTC + +Valid action names at this time are: + + * logisticregression + * matrixfactorization + * pagerank + * sql + * streaming + * svdplusplus + * svm + * trianglecount + * sparkpi + + +## Contact Information + +- <[email protected]> + + +## Help + +- [Juju mailing list](https://lists.ubuntu.com/mailman/listinfo/juju) +- [Juju community](https://jujucharms.com/community) http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions.yaml ---------------------------------------------------------------------- diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions.yaml b/bigtop-packages/src/charm/spark/layer-spark/actions.yaml new file mode 100644 index 0000000..869de8f --- /dev/null +++ b/bigtop-packages/src/charm/spark/layer-spark/actions.yaml @@ -0,0 +1,74 @@ +smoke-test: + description: Verify that Spark is working by calculating pi. +sparkpi: + description: Calculate Pi + params: + partitions: + description: Number of partitions to use for the SparkPi job + type: string + default: "10" +logisticregression: + description: Run the Spark Bench LogisticRegression benchmark. +matrixfactorization: + description: Run the Spark Bench MatrixFactorization benchmark. +pagerank: + description: Run the Spark Bench PageRank benchmark. +sql: + description: Run the Spark Bench SQL benchmark. +streaming: + description: Run the Spark Bench Streaming benchmark. +svdplusplus: + description: Run the Spark Bench SVDPlusPlus benchmark. +svm: + description: Run the Spark Bench SVM benchmark. +trianglecount: + description: Run the Spark Bench TriangleCount benchmark. +restart-spark-job-history-server: + description: Restart the Spark job history server. +start-spark-job-history-server: + description: Start the Spark job history server. +stop-spark-job-history-server: + description: Stop the Spark job history server. +submit: + description: Submit a job to Spark. + required: ['job'] + params: + job: + description: > + URL to a JAR or Python file. This can be any URL supported by + spark-submit, such as a remote URL, an hdfs:// path (if + connected to HDFS), etc. + type: string + class: + description: > + If a JAR is given, this should be the name of the class within + the JAR to run. + type: string + job-args: + description: Arguments for the job. + packages: + description: Comma-separated list of packages to include. + type: string + py-files: + description: Comma-separated list of Python packages to include. + type: string + extra-params: + description: > + Additional params to pass to spark-submit. + For example: "--executor-memory 1000M --supervise" + type: string + cron: + description: > + Schedule the job to be run periodically, according to the + given cron rule. For example: "*/5 * * * *" will run the + job every 5 minutes. + type: string +list-jobs: + description: List scheduled periodic jobs. +remove-job: + description: Remove a job previously scheduled for repeated execution. + required: ['action-id'] + params: + action-id: + type: string + description: The ID returned by the action that scheduled the job. http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/list-jobs ---------------------------------------------------------------------- diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/list-jobs b/bigtop-packages/src/charm/spark/layer-spark/actions/list-jobs new file mode 100755 index 0000000..b6fdf18 --- /dev/null +++ b/bigtop-packages/src/charm/spark/layer-spark/actions/list-jobs @@ -0,0 +1,24 @@ +#!/bin/bash +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +set -e + +for line in "$(crontab -lu ubuntu | grep '# mapreduce job: ')"; do + if [[ -n "$line" ]]; then + action_id=$(echo "$line" | sed -e 's/.* # mapreduce job: //') + job_code=$(echo "$line" | sed -e 's/ # mapreduce job: .*//') + action-set job.$action_id="$job_code" + fi +done http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/logisticregression ---------------------------------------------------------------------- diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/logisticregression b/bigtop-packages/src/charm/spark/layer-spark/actions/logisticregression new file mode 120000 index 0000000..9e15049 --- /dev/null +++ b/bigtop-packages/src/charm/spark/layer-spark/actions/logisticregression @@ -0,0 +1 @@ +sparkbench \ No newline at end of file http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/matrixfactorization ---------------------------------------------------------------------- diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/matrixfactorization b/bigtop-packages/src/charm/spark/layer-spark/actions/matrixfactorization new file mode 120000 index 0000000..9e15049 --- /dev/null +++ b/bigtop-packages/src/charm/spark/layer-spark/actions/matrixfactorization @@ -0,0 +1 @@ +sparkbench \ No newline at end of file http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/pagerank ---------------------------------------------------------------------- diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/pagerank b/bigtop-packages/src/charm/spark/layer-spark/actions/pagerank new file mode 120000 index 0000000..9e15049 --- /dev/null +++ b/bigtop-packages/src/charm/spark/layer-spark/actions/pagerank @@ -0,0 +1 @@ +sparkbench \ No newline at end of file http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/remove-job ---------------------------------------------------------------------- diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/remove-job b/bigtop-packages/src/charm/spark/layer-spark/actions/remove-job new file mode 100755 index 0000000..280ca05 --- /dev/null +++ b/bigtop-packages/src/charm/spark/layer-spark/actions/remove-job @@ -0,0 +1,23 @@ +#!/bin/bash +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +set -e + +action_id="$(action-get action-id)" +if crontab -lu ubuntu | grep -q "$action_id"; then + crontab -lu ubuntu | grep -v "$action_id" | crontab -u ubuntu - +else + action-fail "Job not found: $action_id" +fi http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/restart-spark-job-history-server ---------------------------------------------------------------------- diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/restart-spark-job-history-server b/bigtop-packages/src/charm/spark/layer-spark/actions/restart-spark-job-history-server new file mode 100755 index 0000000..411c335 --- /dev/null +++ b/bigtop-packages/src/charm/spark/layer-spark/actions/restart-spark-job-history-server @@ -0,0 +1,36 @@ +#!/usr/bin/env python3 +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import sys + + +try: + from charmhelpers.core import host, hookenv, unitdata + from jujubigdata import utils + charm_ready = True +except ImportError: + charm_ready = False + +if not charm_ready: + from subprocess import call + call(['action-fail', 'Spark service not yet ready']) + sys.exit(1) + +if not host.service_available('spark-history'): + from subprocess import call + call(['action-fail', 'Spark history service not available']) + sys.exit(1) + +host.service_restart('spark-history-server') http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/smoke-test ---------------------------------------------------------------------- diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/smoke-test b/bigtop-packages/src/charm/spark/layer-spark/actions/smoke-test new file mode 120000 index 0000000..79ccf46 --- /dev/null +++ b/bigtop-packages/src/charm/spark/layer-spark/actions/smoke-test @@ -0,0 +1 @@ +sparkpi \ No newline at end of file http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/sparkbench ---------------------------------------------------------------------- diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/sparkbench b/bigtop-packages/src/charm/spark/layer-spark/actions/sparkbench new file mode 100755 index 0000000..bc66c70 --- /dev/null +++ b/bigtop-packages/src/charm/spark/layer-spark/actions/sparkbench @@ -0,0 +1,115 @@ +#!/bin/bash +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +set -ex + +if ! charms.reactive is_state 'spark.started'; then + action-fail 'Spark not yet ready' + exit +fi + +# Do not call this script directly. Call it via one of the symlinks. The +# symlink name determines the benchmark to run. +BENCHMARK=`basename $0` + +# Juju actions have an annoying lowercase alphanum restriction, so translate +# that into the sparkbench name. +case "${BENCHMARK}" in + logisticregression) + BENCHMARK="LogisticRegression" + RESULT_KEY="LogisticRegression" + ;; + matrixfactorization) + BENCHMARK="MatrixFactorization" + RESULT_KEY="MF" + ;; + pagerank) + BENCHMARK="PageRank" + RESULT_KEY="PageRank" + ;; + sql) + BENCHMARK="SQL" + RESULT_KEY="sql" + ;; + streaming) + BENCHMARK="Streaming" + RESULT_KEY="streaming" + ;; + svdplusplus) + BENCHMARK="SVDPlusPlus" + RESULT_KEY="SVDPlusPlus" + ;; + svm) + BENCHMARK="SVM" + RESULT_KEY="SVM" + ;; + trianglecount) + BENCHMARK="TriangleCount" + RESULT_KEY="TriangleCount" + ;; +esac + +SB_HOME=/home/ubuntu/spark-bench +SB_APPS="${SB_HOME}/bin/applications.lst" +if [ -f "${SB_APPS}" ]; then + VALID_TEST=`grep -c ^${BENCHMARK} ${SB_HOME}/bin/applications.lst` + + if [ ${VALID_TEST} -gt 0 ]; then + # create dir to store results + RUN=`date +%s` + RESULT_DIR=/opt/sparkbench-results/${BENCHMARK} + RESULT_LOG=${RESULT_DIR}/${RUN}.log + mkdir -p ${RESULT_DIR} + chown -R ubuntu:ubuntu ${RESULT_DIR} + + # generate data to be used for benchmarking. this must be run as the ubuntu + # user to make sure we pick up correct spark environment. + echo 'generating data' + su ubuntu << EOF + . /etc/environment + ~/spark-bench/${BENCHMARK}/bin/gen_data.sh &> /dev/null +EOF + + # run the benchmark. this must be run as the ubuntu + # user to make sure we pick up correct spark environment. + echo 'running benchmark' + benchmark-start + su ubuntu << EOF + . /etc/environment + ~/spark-bench/${BENCHMARK}/bin/run.sh &> /dev/null +EOF + benchmark-finish + + # collect our data (the last line in our bench-report.dat file) + DATA=`grep ${RESULT_KEY} ${SB_HOME}/num/bench-report.dat | tail -1` + DURATION=`echo ${DATA} | awk -F, '{print $3}'` + THROUGHPUT=`echo ${DATA} | awk -F, '{print $5}'` + + # send data points and composite score + benchmark-data 'duration' "${DURATION}" 'secs' 'asc' + benchmark-data 'throughput' "${THROUGHPUT}" 'x/sec' 'desc' + benchmark-composite "${DURATION}" 'secs' 'asc' + + # send raw data (benchmark-raw takes a file) + echo ${DATA} > ${RESULT_LOG} + benchmark-raw ${RESULT_LOG} + else + echo "ERROR: Invalid benchmark (${BENCHMARK})" + exit 1 + fi +else + echo "ERROR: Could not find SparkBench application list" + exit 1 +fi http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/sparkpi ---------------------------------------------------------------------- diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/sparkpi b/bigtop-packages/src/charm/spark/layer-spark/actions/sparkpi new file mode 100755 index 0000000..9afceaf --- /dev/null +++ b/bigtop-packages/src/charm/spark/layer-spark/actions/sparkpi @@ -0,0 +1,99 @@ +#!/usr/bin/env python3 +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import sys +sys.path.append('lib') + +from path import Path +from time import time +import subprocess + +from charmhelpers.contrib.benchmark import Benchmark +from charmhelpers.core import hookenv +from charms.reactive import is_state + + +def fail(msg, output): + hookenv.action_set({'output': output}) + hookenv.action_fail(msg) + sys.exit(1) + + +def main(): + bench = Benchmark() + + if not is_state('spark.started'): + fail('Spark not yet ready', 'error') + + num_partitions = hookenv.action_get('partitions') or '' + + # create dir to store results + run = int(time()) + result_dir = Path('/opt/sparkpi-results') + result_log = result_dir / '{}.log'.format(run) + if not result_dir.exists(): + result_dir.mkdir() + result_dir.chown('ubuntu', 'ubuntu') + + bench.start() + start = int(time()) + + hookenv.log("values: {} {}".format(num_partitions, result_log)) + + print('calculating pi') + + with open(result_log, 'w') as log_file: + arg_list = [ + 'spark-submit', + '--class', + 'org.apache.spark.examples.SparkPi', + '/usr/lib/spark/lib/spark-examples.jar' + ] + if num_partitions: + # This is always blank. TODO: figure out what it was + # supposed to do. + arg_list.append(num_partitions) + + try: + subprocess.check_call(arg_list, stdout=log_file, + stderr=subprocess.STDOUT) + except subprocess.CalledProcessError as e: + print('smoke test command failed: ') + print('{}'.format(' '.join(arg_list))) + fail('spark-submit failed: {}'.format(e), 'error') + + stop = int(time()) + bench.finish() + + duration = stop - start + bench.set_composite_score(duration, 'secs') + subprocess.check_call(['benchmark-raw', result_log]) + + with open(result_log) as log: + success = False + for line in log.readlines(): + if 'Pi is roughly 3.1' in line: + success = True + break + + if not success: + fail('spark-submit did not calculate pi', 'error') + + hookenv.action_set({'output': {'status': 'completed'}}) + + +if __name__ == '__main__': + main() http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/sql ---------------------------------------------------------------------- diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/sql b/bigtop-packages/src/charm/spark/layer-spark/actions/sql new file mode 120000 index 0000000..9e15049 --- /dev/null +++ b/bigtop-packages/src/charm/spark/layer-spark/actions/sql @@ -0,0 +1 @@ +sparkbench \ No newline at end of file http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/start-spark-job-history-server ---------------------------------------------------------------------- diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/start-spark-job-history-server b/bigtop-packages/src/charm/spark/layer-spark/actions/start-spark-job-history-server new file mode 100755 index 0000000..9677a38 --- /dev/null +++ b/bigtop-packages/src/charm/spark/layer-spark/actions/start-spark-job-history-server @@ -0,0 +1,36 @@ +#!/usr/bin/env python3 +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import sys + + +try: + from charmhelpers.core import host, hookenv, unitdata + from jujubigdata import utils + charm_ready = True +except ImportError: + charm_ready = False + +if not charm_ready: + from subprocess import call + call(['action-fail', 'Spark service not yet ready']) + sys.exit(1) + +if not host.service_available('spark-history-server'): + from subprocess import call + call(['action-fail', 'Spark history service not available']) + sys.exit(1) + +host.service_start('spark-history-server') http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/stop-spark-job-history-server ---------------------------------------------------------------------- diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/stop-spark-job-history-server b/bigtop-packages/src/charm/spark/layer-spark/actions/stop-spark-job-history-server new file mode 100755 index 0000000..fbe41cf --- /dev/null +++ b/bigtop-packages/src/charm/spark/layer-spark/actions/stop-spark-job-history-server @@ -0,0 +1,36 @@ +#!/usr/bin/env python3 +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import sys + + +try: + from charmhelpers.core import host, hookenv, unitdata + from jujubigdata import utils + charm_ready = True +except ImportError: + charm_ready = False + +if not charm_ready: + from subprocess import call + call(['action-fail', 'Spark service not yet ready']) + sys.exit(1) + +if not host.service_available('spark-history-server'): + from subprocess import call + call(['action-fail', 'Spark history service not available']) + sys.exit(1) + +host.service_stop('spark-history-server') http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/streaming ---------------------------------------------------------------------- diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/streaming b/bigtop-packages/src/charm/spark/layer-spark/actions/streaming new file mode 120000 index 0000000..9e15049 --- /dev/null +++ b/bigtop-packages/src/charm/spark/layer-spark/actions/streaming @@ -0,0 +1 @@ +sparkbench \ No newline at end of file http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/submit ---------------------------------------------------------------------- diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/submit b/bigtop-packages/src/charm/spark/layer-spark/actions/submit new file mode 100755 index 0000000..a25a7af --- /dev/null +++ b/bigtop-packages/src/charm/spark/layer-spark/actions/submit @@ -0,0 +1,57 @@ +#!/bin/bash +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +set -e + +if ! charms.reactive is_state 'spark.started'; then + action-fail 'Spark not yet ready' + exit +fi + +py_files="$(action-get py-files)" +packages="$(action-get packages)" +extra_params="$(action-get extra-params)" +class="$(action-get class)" +job="$(action-get job)" +job_args="$(action-get job-args)" +cron="$(action-get cron)" + +submit_args='--deploy-mode cluster' +if [[ -n "$packages" ]]; then + submit_args="$submit_args --packages $packages" +fi +if [[ -n "$py_files" ]]; then + submit_args="$submit_args --py-files $py_files" +fi +if [[ -n "$extra_params" ]]; then + submit_args="$submit_args $extra_params" +fi +if [[ -n "$class" ]]; then + submit_args="$submit_args --class $class" +fi +submit_args="$submit_args $job" + +job_code=". /etc/environment ; spark-submit ${submit_args} ${job_args}" +action-set job-code="$job_code" + +if [[ -z "$cron" ]]; then + su ubuntu -c "$job_code" +else + juju-log "Scheduling job with ID $JUJU_ACTION_UUID" + action-set action-id="$JUJU_ACTION_UUID" + job_line="$cron $job_code # $JUJU_ACTION_UUID" + crontab -lu ubuntu > /dev/null || echo -n | crontab -u ubuntu - + (crontab -lu ubuntu; echo "$job_line") | crontab -u ubuntu - +fi http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/svdplusplus ---------------------------------------------------------------------- diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/svdplusplus b/bigtop-packages/src/charm/spark/layer-spark/actions/svdplusplus new file mode 120000 index 0000000..9e15049 --- /dev/null +++ b/bigtop-packages/src/charm/spark/layer-spark/actions/svdplusplus @@ -0,0 +1 @@ +sparkbench \ No newline at end of file http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/svm ---------------------------------------------------------------------- diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/svm b/bigtop-packages/src/charm/spark/layer-spark/actions/svm new file mode 120000 index 0000000..9e15049 --- /dev/null +++ b/bigtop-packages/src/charm/spark/layer-spark/actions/svm @@ -0,0 +1 @@ +sparkbench \ No newline at end of file http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/actions/trianglecount ---------------------------------------------------------------------- diff --git a/bigtop-packages/src/charm/spark/layer-spark/actions/trianglecount b/bigtop-packages/src/charm/spark/layer-spark/actions/trianglecount new file mode 120000 index 0000000..9e15049 --- /dev/null +++ b/bigtop-packages/src/charm/spark/layer-spark/actions/trianglecount @@ -0,0 +1 @@ +sparkbench \ No newline at end of file http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/config.yaml ---------------------------------------------------------------------- diff --git a/bigtop-packages/src/charm/spark/layer-spark/config.yaml b/bigtop-packages/src/charm/spark/layer-spark/config.yaml new file mode 100644 index 0000000..fb53096 --- /dev/null +++ b/bigtop-packages/src/charm/spark/layer-spark/config.yaml @@ -0,0 +1,38 @@ +options: + resources_mirror: + type: string + default: '' + description: | + URL used to fetch resources (e.g., Hadoop binaries) instead of the + location specified in resources.yaml. + spark_bench_enabled: + type: boolean + default: true + description: | + When set to 'true' (the default), this charm will download and + install the SparkBench benchmark suite from the configured URLs. + When set to 'false', SparkBench will be removed from the unit, + though any data stored in hdfs:///user/ubuntu/spark-bench will be + preserved. + spark_bench_ppc64le: + type: string + default: 'https://s3.amazonaws.com/jujubigdata/ibm/noarch/spark-bench-2.0-20151214-ffb72f23.tgz#sha256=ffb72f233eaafccef4dda6d4516f23e043d1b14b9d63734211f4d1968db86a3c' + description: | + URL (including hash) of a ppc64le tarball of SparkBench. By + default, this points to a pre-built SparkBench binary based on + sources in the upstream repository. This option is only valid when + 'spark_bench_enabled' is 'true'. + spark_bench_x86_64: + type: string + default: 'https://s3.amazonaws.com/jujubigdata/ibm/noarch/spark-bench-2.0-20151214-ffb72f23.tgz#sha256=ffb72f233eaafccef4dda6d4516f23e043d1b14b9d63734211f4d1968db86a3c' + description: | + URL (including hash) of an x86_64 tarball of SparkBench. By + default, this points to a pre-built SparkBench binary based on + sources in the upstream repository. This option is only valid when + 'spark_bench_enabled' is 'true'. + spark_execution_mode: + type: string + default: 'standalone' + description: | + Options are "local", "standalone", "yarn-client", and + "yarn-cluster". Consult the readme for details on these options. http://git-wip-us.apache.org/repos/asf/bigtop/blob/f5e89f4e/bigtop-packages/src/charm/spark/layer-spark/copyright ---------------------------------------------------------------------- diff --git a/bigtop-packages/src/charm/spark/layer-spark/copyright b/bigtop-packages/src/charm/spark/layer-spark/copyright new file mode 100644 index 0000000..52de50a --- /dev/null +++ b/bigtop-packages/src/charm/spark/layer-spark/copyright @@ -0,0 +1,16 @@ +Format: http://dep.debian.net/deps/dep5/ + +Files: * +Copyright: Copyright 2015, Canonical Ltd., All Rights Reserved, The Apache Software Foundation +License: Apache License 2.0 + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + . + http://www.apache.org/licenses/LICENSE-2.0 + . + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License.
