spark git commit: [SPARK-21506][DOC] The description of "spark.executor.cores" may be not correct

wenchen Tue, 10 Oct 2017 05:44:57 -0700

Repository: spark
Updated Branches:
  refs/heads/master 3b5c2a84b -> b8a08f25c



[SPARK-21506][DOC] The description of "spark.executor.cores" may be not correct

## What changes were proposed in this pull request?

The number of cores assigned to each executor is configurable. When this is not 
explicitly set,  multiple executors from the same application may be launched 
on the same worker too.

## How was this patch tested?
N/A

Author: liuxian <[email protected]>

Closes #18711 from 10110346/executorcores.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b8a08f25
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b8a08f25
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b8a08f25

Branch: refs/heads/master
Commit: b8a08f25cc64ed3034f3c90790931c30e5b0f236
Parents: 3b5c2a8
Author: liuxian <[email protected]>
Authored: Tue Oct 10 20:44:33 2017 +0800
Committer: Wenchen Fan <[email protected]>
Committed: Tue Oct 10 20:44:33 2017 +0800

----------------------------------------------------------------------
 .../apache/spark/deploy/client/StandaloneAppClient.scala |  2 +-
 .../scala/org/apache/spark/deploy/master/Master.scala    |  8 +++++++-
 .../scheduler/cluster/StandaloneSchedulerBackend.scala   |  2 +-
 docs/configuration.md                                    | 11 ++++-------
 docs/spark-standalone.md                                 |  8 ++++++++
 5 files changed, 21 insertions(+), 10 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/b8a08f25/core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala
----------------------------------------------------------------------
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala 
b/core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala
index 757c930..34ade4c 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala
@@ -170,7 +170,7 @@ private[spark] class StandaloneAppClient(
 
       case ExecutorAdded(id: Int, workerId: String, hostPort: String, cores: 
Int, memory: Int) =>
         val fullId = appId + "/" + id
-        logInfo("Executor added: %s on %s (%s) with %d cores".format(fullId, 
workerId, hostPort,
+        logInfo("Executor added: %s on %s (%s) with %d core(s)".format(fullId, 
workerId, hostPort,
           cores))
         listener.executorAdded(fullId, workerId, hostPort, cores, memory)
 

http://git-wip-us.apache.org/repos/asf/spark/blob/b8a08f25/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
----------------------------------------------------------------------
diff --git a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
index e030cac..2c78c15 100644
--- a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
@@ -581,7 +581,13 @@ private[deploy] class Master(
    * The number of cores assigned to each executor is configurable. When this 
is explicitly set,
    * multiple executors from the same application may be launched on the same 
worker if the worker
    * has enough cores and memory. Otherwise, each executor grabs all the cores 
available on the
-   * worker by default, in which case only one executor may be launched on 
each worker.
+   * worker by default, in which case only one executor per application may be 
launched on each
+   * worker during one single schedule iteration.
+   * Note that when `spark.executor.cores` is not set, we may still launch 
multiple executors from
+   * the same application on the same worker. Consider appA and appB both have 
one executor running
+   * on worker1, and appA.coresLeft > 0, then appB is finished and release all 
its cores on worker1,
+   * thus for the next schedule iteration, appA launches a new executor that 
grabs all the free
+   * cores on worker1, therefore we get multiple executors from appA running 
on worker1.
    *
    * It is important to allocate coresPerExecutor on each worker at a time 
(instead of 1 core
    * at a time). Consider the following example: cluster has 4 workers with 16 
cores each.

http://git-wip-us.apache.org/repos/asf/spark/blob/b8a08f25/core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
----------------------------------------------------------------------
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
 
b/core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
index a4e2a74..505c342 100644
--- 
a/core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
+++ 
b/core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
@@ -153,7 +153,7 @@ private[spark] class StandaloneSchedulerBackend(
 
   override def executorAdded(fullId: String, workerId: String, hostPort: 
String, cores: Int,
     memory: Int) {
-    logInfo("Granted executor ID %s on hostPort %s with %d cores, %s 
RAM".format(
+    logInfo("Granted executor ID %s on hostPort %s with %d core(s), %s 
RAM".format(
       fullId, hostPort, cores, Utils.megabytesToString(memory)))
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/b8a08f25/docs/configuration.md
----------------------------------------------------------------------
diff --git a/docs/configuration.md b/docs/configuration.md
index 6e9fe59..7a777d3 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1015,7 +1015,7 @@ Apart from these, the following properties are also 
available, and may be useful
   <td>0.5</td>
   <td>
     Amount of storage memory immune to eviction, expressed as a fraction of 
the size of the
-    region set aside by <code>sâpark.memory.fraction</code>. The higher this 
is, the less
+    region set aside by <code>spark.memory.fraction</code>. The higher this 
is, the less
     working memory may be available to execution and tasks may spill to disk 
more often.
     Leaving this at the default value is recommended. For more detail, see
     <a href="tuning.html#memory-management-overview">this description</a>.
@@ -1041,7 +1041,7 @@ Apart from these, the following properties are also 
available, and may be useful
   <td><code>spark.memory.useLegacyMode</code></td>
   <td>false</td>
   <td>
-    âWhether to enable the legacy memory management mode used in Spark 1.5 
and before.
+    Whether to enable the legacy memory management mode used in Spark 1.5 and 
before.
     The legacy mode rigidly partitions the heap space into fixed-size regions,
     potentially leading to excessive spilling if the application was not tuned.
     The following deprecated memory fraction configurations are not read 
unless this is enabled:
@@ -1115,11 +1115,8 @@ Apart from these, the following properties are also 
available, and may be useful
   <td>
     The number of cores to use on each executor.
 
-    In standalone and Mesos coarse-grained modes, setting this
-    parameter allows an application to run multiple executors on the
-    same worker, provided that there are enough cores on that
-    worker. Otherwise, only one executor per application will run on
-    each worker.
+    In standalone and Mesos coarse-grained modes, for more detail, see
+    <a href="spark-standalone.html#Executors Scheduling">this description</a>.
   </td>
 </tr>
 <tr>

http://git-wip-us.apache.org/repos/asf/spark/blob/b8a08f25/docs/spark-standalone.md
----------------------------------------------------------------------
diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md
index 1095386..f51c5cc 100644
--- a/docs/spark-standalone.md
+++ b/docs/spark-standalone.md
@@ -328,6 +328,14 @@ export 
SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=<value>"
 This is useful on shared clusters where users might not have configured a 
maximum number of cores
 individually.
 
+# Executors Scheduling
+
+The number of cores assigned to each executor is configurable. When 
`spark.executor.cores` is
+explicitly set, multiple executors from the same application may be launched 
on the same worker
+if the worker has enough cores and memory. Otherwise, each executor grabs all 
the cores available
+on the worker by default, in which case only one executor per application may 
be launched on each
+worker during one single schedule iteration.
+
 # Monitoring and Logging
 
 Spark's standalone mode offers a web-based user interface to monitor the 
cluster. The master and each worker has its own web UI that shows cluster and 
job statistics. By default you can access the web UI for the master at port 
8080. The port can be changed either in the configuration file or via 
command-line options.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-21506][DOC] The description of "spark.executor.cores" may be not correct

Reply via email to