[spark] branch master updated: [SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same host in Standalone

jiangxb1987 Wed, 15 Apr 2020 11:32:25 -0700

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 0d4e4df  [SPARK-31018][CORE][DOCS] Deprecate support of multiple 
workers on the same host in Standalone
0d4e4df is described below

commit 0d4e4df06105cf2985dde17c1af76093b3ae8c13
Author: yi.wu <[email protected]>
AuthorDate: Wed Apr 15 11:29:55 2020 -0700

    [SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same 
host in Standalone
    
    ### What changes were proposed in this pull request?
    
    Update the document and shell script to warn user about the deprecation of 
multiple workers on the same host support.
    
    ### Why are the changes needed?
    
    This is a sub-task of 
[SPARK-30978](https://issues.apache.org/jira/browse/SPARK-30978), which plans 
to totally remove support of multiple workers in Spark 3.1. This PR makes the 
first step to deprecate it firstly in Spark 3.0.
    
    ### Does this PR introduce any user-facing change?
    
    Yeah, user see warning when they run start worker script.
    
    ### How was this patch tested?
    
    Tested manually.
    
    Closes #27768 from Ngone51/deprecate_spark_worker_instances.
    
    Authored-by: yi.wu <[email protected]>
    Signed-off-by: Xingbo Jiang <[email protected]>
---
 docs/core-migration-guide.md  | 2 ++
 docs/hardware-provisioning.md | 8 ++++----
 sbin/start-slave.sh           | 2 +-
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md
index 66a489b..cde6e07 100644
--- a/docs/core-migration-guide.md
+++ b/docs/core-migration-guide.md
@@ -38,3 +38,5 @@ license: |
 - Event log file will be written as UTF-8 encoding, and Spark History Server 
will replay event log files as UTF-8 encoding. Previously Spark wrote the event 
log file as default charset of driver JVM process, so Spark History Server of 
Spark 2.x is needed to read the old event log files in case of incompatible 
encoding.
 
 - A new protocol for fetching shuffle blocks is used. It's recommended that 
external shuffle services be upgraded when running Spark 3.0 apps. You can 
still use old external shuffle services by setting the configuration 
`spark.shuffle.useOldFetchProtocol` to `true`. Otherwise, Spark may run into 
errors with messages like `IllegalArgumentException: Unexpected message type: 
<number>`.
+
+- `SPARK_WORKER_INSTANCES` is deprecated in Standalone mode. It's recommended 
to launch multiple executors in one worker and launch one worker per node 
instead of launching multiple workers per node and launching one executor per 
worker.
diff --git a/docs/hardware-provisioning.md b/docs/hardware-provisioning.md
index 4e5d681..fc87995f 100644
--- a/docs/hardware-provisioning.md
+++ b/docs/hardware-provisioning.md
@@ -63,10 +63,10 @@ Note that memory usage is greatly affected by storage level 
and serialization fo
 the [tuning guide](tuning.html) for tips on how to reduce it.
 
 Finally, note that the Java VM does not always behave well with more than 200 
GiB of RAM. If you
-purchase machines with more RAM than this, you can run _multiple worker JVMs 
per node_. In
-Spark's [standalone mode](spark-standalone.html), you can set the number of 
workers per node
-with the `SPARK_WORKER_INSTANCES` variable in `conf/spark-env.sh`, and the 
number of cores
-per worker with `SPARK_WORKER_CORES`.
+purchase machines with more RAM than this, you can launch multiple executors 
in a single node. In
+Spark's [standalone mode](spark-standalone.html), a worker is responsible for 
launching multiple
+executors according to its available memory and cores, and each executor will 
be launched in a
+separate Java VM.
 
 # Network
 
diff --git a/sbin/start-slave.sh b/sbin/start-slave.sh
index 2cb17a0..9b3b26b 100755
--- a/sbin/start-slave.sh
+++ b/sbin/start-slave.sh
@@ -22,7 +22,7 @@
 # Environment Variables
 #
 #   SPARK_WORKER_INSTANCES  The number of worker instances to run on this
-#                           slave.  Default is 1.
+#                           slave.  Default is 1. Note it has been deprecate 
since Spark 3.0.
 #   SPARK_WORKER_PORT       The base port number for the first worker. If set,
 #                           subsequent workers will increment this number.  If
 #                           unset, Spark will find a valid port number, but


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch master updated: [SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same host in Standalone

Reply via email to