Repository: zeppelin Updated Branches: refs/heads/master 16b320ff9 -> b96550329
[ZEPPELIN-1198][Spark Standalone] Documents for running zeppelin on production environments. ### What is this PR for? This PR is for documentation for running zeppelin on production environments. ### What type of PR is it? Documentation ### What is the Jira issue? https://issues.apache.org/jira/browse/ZEPPELIN-1198 ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? no Author: astroshim <[email protected]> Closes #1227 from astroshim/ZEPPELIN-1198/standalone and squashes the following commits: 53a32f2 [astroshim] add 'via Docker' 61a0e5e [astroshim] add apache license header 83fdef6 [astroshim] doc for spark standalone Project: http://git-wip-us.apache.org/repos/asf/zeppelin/repo Commit: http://git-wip-us.apache.org/repos/asf/zeppelin/commit/b9655032 Tree: http://git-wip-us.apache.org/repos/asf/zeppelin/tree/b9655032 Diff: http://git-wip-us.apache.org/repos/asf/zeppelin/diff/b9655032 Branch: refs/heads/master Commit: b965503291fd004f2044df1c8d257aa4c7b1c522 Parents: 16b320f Author: astroshim <[email protected]> Authored: Fri Jul 29 01:03:17 2016 +0900 Committer: Alexander Bezzubov <[email protected]> Committed: Wed Aug 3 18:47:21 2016 +0900 ---------------------------------------------------------------------- docs/_includes/themes/zeppelin/_navigation.html | 5 +- .../themes/zeppelin/img/docs-img/spark_ui.png | Bin 0 -> 206211 bytes .../zeppelin/img/docs-img/standalone_conf.png | Bin 0 -> 184762 bytes docs/index.md | 4 +- docs/install/spark_cluster_mode.md | 74 +++++++++++++++++++ .../spark_standalone/Dockerfile | 54 ++++++++++++++ .../spark_standalone/entrypoint.sh | 31 ++++++++ 7 files changed, 166 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/zeppelin/blob/b9655032/docs/_includes/themes/zeppelin/_navigation.html ---------------------------------------------------------------------- diff --git a/docs/_includes/themes/zeppelin/_navigation.html b/docs/_includes/themes/zeppelin/_navigation.html index 7756f23..e809b09 100644 --- a/docs/_includes/themes/zeppelin/_navigation.html +++ b/docs/_includes/themes/zeppelin/_navigation.html @@ -32,7 +32,6 @@ <li><a href="{{BASE_PATH}}/manual/notebookashomepage.html">Customize Zeppelin Homepage</a></li> <li role="separator" class="divider"></li> <li class="title"><span><b>More</b><span></li> - <li><a href="{{BASE_PATH}}/install/virtual_machine.html">Zeppelin on Vagrant VM</a></li> <li><a href="{{BASE_PATH}}/install/upgrade.html">Upgrade Zeppelin Version</a></li> </ul> </li> @@ -103,6 +102,10 @@ <li><a href="{{BASE_PATH}}/security/notebook_authorization.html">Notebook Authorization</a></li> <li><a href="{{BASE_PATH}}/security/datasource_authorization.html">Data Source Authorization</a></li> <li role="separator" class="divider"></li> + <li class="title"><span><b>Advanced</b><span></li> + <li><a href="{{BASE_PATH}}/install/virtual_machine.html">Zeppelin on Vagrant VM</a></li> + <li><a href="{{BASE_PATH}}/install/spark_cluster_mode.html#spark-standalone-mode">Zeppelin on Spark Cluster Mode (Standalone)</a></li> + <li role="separator" class="divider"></li> <li class="title"><span><b>Contibute</b><span></li> <li><a href="{{BASE_PATH}}/development/writingzeppelininterpreter.html">Writing Zeppelin Interpreter</a></li> <li><a href="{{BASE_PATH}}/development/writingzeppelinapplication.html">Writing Zeppelin Application (Experimental)</a></li> http://git-wip-us.apache.org/repos/asf/zeppelin/blob/b9655032/docs/assets/themes/zeppelin/img/docs-img/spark_ui.png ---------------------------------------------------------------------- diff --git a/docs/assets/themes/zeppelin/img/docs-img/spark_ui.png b/docs/assets/themes/zeppelin/img/docs-img/spark_ui.png new file mode 100644 index 0000000..ca91cf0 Binary files /dev/null and b/docs/assets/themes/zeppelin/img/docs-img/spark_ui.png differ http://git-wip-us.apache.org/repos/asf/zeppelin/blob/b9655032/docs/assets/themes/zeppelin/img/docs-img/standalone_conf.png ---------------------------------------------------------------------- diff --git a/docs/assets/themes/zeppelin/img/docs-img/standalone_conf.png b/docs/assets/themes/zeppelin/img/docs-img/standalone_conf.png new file mode 100644 index 0000000..908fc84 Binary files /dev/null and b/docs/assets/themes/zeppelin/img/docs-img/standalone_conf.png differ http://git-wip-us.apache.org/repos/asf/zeppelin/blob/b9655032/docs/index.md ---------------------------------------------------------------------- diff --git a/docs/index.md b/docs/index.md index 141e7f6..399393c 100644 --- a/docs/index.md +++ b/docs/index.md @@ -133,7 +133,6 @@ Join to our [Mailing list](https://zeppelin.apache.org/community.html) and repor * [Publish your Paragraph](./manual/publish.html) results into your external website * [Customize Zeppelin Homepage](./manual/notebookashomepage.html) with one of your notebooks * More - * [Apache Zeppelin on Vagrant VM](./install/virtual_machine.html): a guide for installing Apache Zeppelin on Vagrant virtual machine * [Upgrade Apache Zeppelin Version](./install/upgrade.html): a manual procedure of upgrading Apache Zeppelin version ####Interpreter @@ -168,6 +167,9 @@ Join to our [Mailing list](https://zeppelin.apache.org/community.html) and repor * [Shiro Authentication](./security/shiroauthentication.html) * [Notebook Authorization](./security/notebook_authorization.html) * [Data Source Authorization](./security/datasource_authorization.html) +* Advanced + * [Apache Zeppelin on Vagrant VM](./install/virtual_machine.html) + * [Zeppelin on Spark Cluster Mode (Standalone via Docker)](./install/spark_cluster_mode.html#spark-standalone-mode) * Contribute * [Writing Zeppelin Interpreter](./development/writingzeppelininterpreter.html) * [Writing Zeppelin Application (Experimental)](./development/writingzeppelinapplication.html) http://git-wip-us.apache.org/repos/asf/zeppelin/blob/b9655032/docs/install/spark_cluster_mode.md ---------------------------------------------------------------------- diff --git a/docs/install/spark_cluster_mode.md b/docs/install/spark_cluster_mode.md new file mode 100644 index 0000000..d2517bd --- /dev/null +++ b/docs/install/spark_cluster_mode.md @@ -0,0 +1,74 @@ +--- +layout: page +title: "Apache Zeppelin on Spark cluster mode" +description: "" +group: install +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +{% include JB/setup %} + +# Apache Zeppelin on Spark Cluster Mode + +<div id="toc"></div> + +## Overview +[Apache Spark](http://spark.apache.org/) has supported three cluster manager types([Standalone](http://spark.apache.org/docs/latest/spark-standalone.html), [Apache Mesos](http://spark.apache.org/docs/latest/running-on-mesos.html) and [Hadoop YARN](http://spark.apache.org/docs/latest/running-on-yarn.html)) so far. +This document will guide you how you can build and configure the environment on 3 types of Spark cluster manager with Apache Zeppelin using [Docker](https://www.docker.com/) scripts. +So [install docker](https://docs.docker.com/engine/installation/) on the machine first. + +## Spark standalone mode +[Spark standalone](http://spark.apache.org/docs/latest/spark-standalone.html) is a simple cluster manager included with Spark that makes it easy to set up a cluster. +You can simply set up Spark standalone environment with below steps. + +> **Note :** Since Apache Zeppelin and Spark use same `8080` port for their web UI, you might need to change `zeppelin.server.port` in `conf/zeppelin-site.xml`. + +### 1. Build Docker file +You can find docker script files under `scripts/docker/spark-cluster-managers`. + +``` +cd $ZEPPELIN_HOME/scripts/docker/spark-cluster-managers/spark_standalone +docker build -t "spark_standalone" . +``` + +### 2. Run docker + +``` +docker run -it \ +-p 8080:8080 \ +-p 7077:7077 \ +-p 8888:8888 \ +-p 8081:8081 \ +-h sparkmaster \ +--name spark_standalone \ +spark_standalone bash; +``` + +### 3. Configure Spark interpreter in Zeppelin +Set Spark master as `spark://localhost:7077` in Zeppelin **Interpreters** setting page. + +<img src="../assets/themes/zeppelin/img/docs-img/standalone_conf.png" /> + +### 4. Run Zeppelin with Spark interpreter +After running single paragraph with Spark interpreter in Zeppelin, browse `https://localhost:8080` and check whether Spark cluster is running well or not. + +<img src="../assets/themes/zeppelin/img/docs-img/spark_ui.png" /> + +You can also simply verify that Spark is running well in Docker with below command. + +``` +ps -ef | grep spark +``` + + http://git-wip-us.apache.org/repos/asf/zeppelin/blob/b9655032/scripts/docker/spark-cluster-managers/spark_standalone/Dockerfile ---------------------------------------------------------------------- diff --git a/scripts/docker/spark-cluster-managers/spark_standalone/Dockerfile b/scripts/docker/spark-cluster-managers/spark_standalone/Dockerfile new file mode 100644 index 0000000..a7bae23 --- /dev/null +++ b/scripts/docker/spark-cluster-managers/spark_standalone/Dockerfile @@ -0,0 +1,54 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +FROM centos:centos6 +MAINTAINER [email protected] + +ENV SPARK_PROFILE 1.6 +ENV SPARK_VERSION 1.6.2 +ENV HADOOP_PROFILE 2.3 +ENV SPARK_HOME /usr/local/spark + +# Update the image with the latest packages +RUN yum update -y; yum clean all + +# Get utils +RUN yum install -y \ +wget \ +tar \ +curl \ +&& \ +yum clean all + +# Remove old jdk +RUN yum remove java; yum remove jdk + +# install jdk7 +RUN yum install -y java-1.7.0-openjdk-devel +ENV JAVA_HOME /usr/lib/jvm/java +ENV PATH $PATH:$JAVA_HOME/bin + +# install spark +RUN curl -s http://apache.mirror.cdnetworks.com/spark/spark-$SPARK_VERSION/spark-$SPARK_VERSION-bin-hadoop$HADOOP_PROFILE.tgz | tar -xz -C /usr/local/ +RUN cd /usr/local && ln -s spark-$SPARK_VERSION-bin-hadoop$HADOOP_PROFILE spark + +# update boot script +COPY entrypoint.sh /etc/entrypoint.sh +RUN chown root.root /etc/entrypoint.sh +RUN chmod 700 /etc/entrypoint.sh + +#spark +EXPOSE 8080 7077 8888 8081 + +ENTRYPOINT ["/etc/entrypoint.sh"] http://git-wip-us.apache.org/repos/asf/zeppelin/blob/b9655032/scripts/docker/spark-cluster-managers/spark_standalone/entrypoint.sh ---------------------------------------------------------------------- diff --git a/scripts/docker/spark-cluster-managers/spark_standalone/entrypoint.sh b/scripts/docker/spark-cluster-managers/spark_standalone/entrypoint.sh new file mode 100755 index 0000000..f4fded0 --- /dev/null +++ b/scripts/docker/spark-cluster-managers/spark_standalone/entrypoint.sh @@ -0,0 +1,31 @@ +#!/bin/bash +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +export SPARK_MASTER_PORT=7077 + +# run spark +cd /usr/local/spark/sbin +./start-master.sh +./start-slave.sh spark://`hostname`:$SPARK_MASTER_PORT + +CMD=${1:-"exit 0"} +if [[ "$CMD" == "-d" ]]; +then + service sshd stop + /usr/sbin/sshd -D -d +else + /bin/bash -c "$*" +fi
