Repository: spark
Updated Branches:
  refs/heads/branch-1.0 2a49a8d03 -> 9013197f8


[SPARK-2259] Fix highly misleading docs on cluster / client deploy modes

The existing docs are highly misleading. For standalone mode, for example, it 
encourages the user to use standalone-cluster mode, which is not officially 
supported. The safeguards have been added in Spark submit itself to prevent bad 
documentation from leading users down the wrong path in the future.

This PR is prompted by countless headaches users of Spark have run into on the 
mailing list.

Author: Andrew Or <[email protected]>

Closes #1200 from andrewor14/submit-docs and squashes the following commits:

5ea2460 [Andrew Or] Rephrase cluster vs client explanation
c827f32 [Andrew Or] Clarify spark submit messages
9f7ed8f [Andrew Or] Clarify client vs cluster deploy mode + add safeguards
(cherry picked from commit f17510e371dfbeaada3c72b884d70c36503ea30a)

Signed-off-by: Patrick Wendell <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9013197f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9013197f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9013197f

Branch: refs/heads/branch-1.0
Commit: 9013197f8bfc887d6c9945b7c7c61f70b55ab0ea
Parents: 2a49a8d
Author: Andrew Or <[email protected]>
Authored: Fri Jun 27 16:11:31 2014 -0700
Committer: Patrick Wendell <[email protected]>
Committed: Fri Jun 27 16:11:53 2014 -0700

----------------------------------------------------------------------
 .../org/apache/spark/deploy/SparkSubmit.scala      | 17 ++++++++++++++---
 .../apache/spark/deploy/SparkSubmitArguments.scala |  5 +++--
 docs/running-on-mesos.md                           |  3 ++-
 docs/spark-standalone.md                           |  9 ++++-----
 docs/submitting-applications.md                    | 14 +++++++++++++-
 5 files changed, 36 insertions(+), 12 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/9013197f/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
----------------------------------------------------------------------
diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
index 7e9a934..b050dcc 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
@@ -19,7 +19,7 @@ package org.apache.spark.deploy
 
 import java.io.{File, PrintStream}
 import java.lang.reflect.InvocationTargetException
-import java.net.{URI, URL}
+import java.net.URL
 
 import scala.collection.mutable.{ArrayBuffer, HashMap, Map}
 
@@ -117,14 +117,25 @@ object SparkSubmit {
     val isPython = args.isPython
     val isYarnCluster = clusterManager == YARN && deployOnCluster
 
+    // For mesos, only client mode is supported
     if (clusterManager == MESOS && deployOnCluster) {
-      printErrorAndExit("Cannot currently run driver on the cluster in Mesos")
+      printErrorAndExit("Cluster deploy mode is currently not supported for 
Mesos clusters.")
+    }
+
+    // For standalone, only client mode is supported
+    if (clusterManager == STANDALONE && deployOnCluster) {
+      printErrorAndExit("Cluster deploy mode is currently not supported for 
standalone clusters.")
+    }
+
+    // For shells, only client mode is applicable
+    if (isShell(args.primaryResource) && deployOnCluster) {
+      printErrorAndExit("Cluster deploy mode is not applicable to Spark 
shells.")
     }
 
     // If we're running a python app, set the main class to our specific 
python runner
     if (isPython) {
       if (deployOnCluster) {
-        printErrorAndExit("Cannot currently run Python driver programs on 
cluster")
+        printErrorAndExit("Cluster deploy mode is currently not supported for 
python.")
       }
       if (args.primaryResource == PYSPARK_SHELL) {
         args.mainClass = "py4j.GatewayServer"

http://git-wip-us.apache.org/repos/asf/spark/blob/9013197f/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
----------------------------------------------------------------------
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
index f1032ea..57655aa 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
@@ -338,8 +338,9 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
       """Usage: spark-submit [options] <app jar | python file> [app options]
         |Options:
         |  --master MASTER_URL         spark://host:port, mesos://host:port, 
yarn, or local.
-        |  --deploy-mode DEPLOY_MODE   Where to run the driver program: either 
"client" to run
-        |                              on the local machine, or "cluster" to 
run inside cluster.
+        |  --deploy-mode DEPLOY_MODE   Whether to launch the driver program 
locally ("client") or
+        |                              on one of the worker machines inside 
the cluster ("cluster")
+        |                              (Default: client).
         |  --class CLASS_NAME          Your application's main class (for Java 
/ Scala apps).
         |  --name NAME                 A name of your application.
         |  --jars JARS                 Comma-separated list of local jars to 
include on the driver

http://git-wip-us.apache.org/repos/asf/spark/blob/9013197f/docs/running-on-mesos.md
----------------------------------------------------------------------
diff --git a/docs/running-on-mesos.md b/docs/running-on-mesos.md
index e3c8922..bd046cf 100644
--- a/docs/running-on-mesos.md
+++ b/docs/running-on-mesos.md
@@ -127,7 +127,8 @@ val sc = new SparkContext(conf)
 {% endhighlight %}
 
 (You can also use [`spark-submit`](submitting-applications.html) and configure 
`spark.executor.uri`
-in the 
[conf/spark-defaults.conf](configuration.html#loading-default-configurations) 
file.)
+in the 
[conf/spark-defaults.conf](configuration.html#loading-default-configurations) 
file. Note
+that `spark-submit` currently only supports deploying the Spark driver in 
`client` mode for Mesos.)
 
 When running a shell, the `spark.executor.uri` parameter is inherited from 
`SPARK_EXECUTOR_URI`, so
 it does not need to be redundantly passed in as a system property.

http://git-wip-us.apache.org/repos/asf/spark/blob/9013197f/docs/spark-standalone.md
----------------------------------------------------------------------
diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md
index 3c1ce06..f5c0f7c 100644
--- a/docs/spark-standalone.md
+++ b/docs/spark-standalone.md
@@ -235,11 +235,10 @@ You can also pass an option `--cores <numCores>` to 
control the number of cores
 
 # Launching Compiled Spark Applications
 
-Spark supports two deploy modes: applications may run with the driver inside 
the client process or
-entirely inside the cluster. The
-[`spark-submit` script](submitting-applications.html) provides the
-most straightforward way to submit a compiled Spark application to the cluster 
in either deploy
-mode.
+The [`spark-submit` script](submitting-applications.html) provides the most 
straightforward way to
+submit a compiled Spark application to the cluster. For standalone clusters, 
Spark currently
+only supports deploying the driver inside the client process that is 
submitting the application
+(`client` deploy mode).
 
 If your application is launched through Spark submit, then the application jar 
is automatically
 distributed to all worker nodes. For any additional jars that your application 
depends on, you

http://git-wip-us.apache.org/repos/asf/spark/blob/9013197f/docs/submitting-applications.md
----------------------------------------------------------------------
diff --git a/docs/submitting-applications.md b/docs/submitting-applications.md
index d2864fe..e058830 100644
--- a/docs/submitting-applications.md
+++ b/docs/submitting-applications.md
@@ -42,10 +42,22 @@ Some of the commonly used options are:
 
 * `--class`: The entry point for your application (e.g. 
`org.apache.spark.examples.SparkPi`)
 * `--master`: The [master URL](#master-urls) for the cluster (e.g. 
`spark://23.195.26.187:7077`)
-* `--deploy-mode`: Whether to deploy your driver program within the cluster or 
run it locally as an external client (either `cluster` or `client`)
+* `--deploy-mode`: Whether to deploy your driver on the worker nodes 
(`cluster`) or locally as an external client (`client`) (default: `client`)*
 * `application-jar`: Path to a bundled jar including your application and all 
dependencies. The URL must be globally visible inside of your cluster, for 
instance, an `hdfs://` path or a `file://` path that is present on all nodes.
 * `application-arguments`: Arguments passed to the main method of your main 
class, if any
 
+*A common deployment strategy is to submit your application from a gateway 
machine that is
+physically co-located with your worker machines (e.g. Master node in a 
standalone EC2 cluster).
+In this setup, `client` mode is appropriate. In `client` mode, the driver is 
launched directly
+within the client `spark-submit` process, with the input and output of the 
application attached
+to the console. Thus, this mode is especially suitable for applications that 
involve the REPL
+(e.g. Spark shell).
+
+Alternatively, if your application is submitted from a machine far from the 
worker machines (e.g.
+locally on your laptop), it is common to use `cluster` mode to minimize 
network latency between
+the drivers and the executors. Note that `cluster` mode is currently not 
supported for standalone
+clusters, Mesos clusters, or python applications.
+
 For Python applications, simply pass a `.py` file in the place of 
`<application-jar>` instead of a JAR,
 and add Python `.zip`, `.egg` or `.py` files to the search path with 
`--py-files`.
 

Reply via email to