[spark] branch master updated: [SPARK-44727][CORE][DOCS] Improve docs and error message for dynamic allocation conditions

yao Fri, 11 Aug 2023 00:09:07 -0700

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new e3caa473776 [SPARK-44727][CORE][DOCS] Improve docs and error message 
for dynamic allocation conditions
e3caa473776 is described below

commit e3caa4737768241d91127f019f4a8f4043fa466a
Author: Cheng Pan <cheng...@apache.org>
AuthorDate: Fri Aug 11 15:08:49 2023 +0800

    [SPARK-44727][CORE][DOCS] Improve docs and error message for dynamic 
allocation conditions
    
    ### What changes were proposed in this pull request?
    
    Clarify the DRA enabling conditions in docs and error message. (no \n 
actually)
    
    ```
    Dynamic allocation of executors requires one of the following conditions:
    1) enabling external shuffle service through spark.shuffle.service.enabled.
    2) enabling shuffle tracking through 
spark.dynamicAllocation.shuffleTracking.enabled.
    3) enabling shuffle blocks decommission through spark.decommission.enabled 
and spark.storage.decommission.shuffleBlocks.enabled.
    4) (Experimental) configuring spark.shuffle.sort.io.plugin.class to use a 
custom ShuffleDataIO who's ShuffleDriverComponents supports reliable storage.
    ```
    
    ### Why are the changes needed?
    
    Currently, ESS is not the only way to support DRA, but users always see the 
misleading error message when it does not meet the DRA conditions
    ```
    Dynamic allocation of executors requires the external shuffle service. You 
may enable this
    through spark.shuffle.service.enabled.
    ```
    
    This is misleading, especially for users who want to enable DRA in Spark on 
K8s cases.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Review.
    
    Closes #42404 from pan3793/SPARK-44727.
    
    Authored-by: Cheng Pan <cheng...@apache.org>
    Signed-off-by: Kent Yao <y...@apache.org>
---
 .../scala/org/apache/spark/ExecutorAllocationManager.scala | 10 ++++++++--
 docs/configuration.md                                      |  7 +++++--
 docs/job-scheduling.md                                     | 14 +++++++++-----
 3 files changed, 22 insertions(+), 9 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala 
b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
index 187125a66c9..441bf60e489 100644
--- a/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
+++ b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
@@ -211,8 +211,14 @@ private[spark] class ExecutorAllocationManager(
           conf.get(config.STORAGE_DECOMMISSION_SHUFFLE_BLOCKS_ENABLED)) {
         logInfo("Shuffle data decommission is enabled without a shuffle 
service.")
       } else if (!testing) {
-        throw new SparkException("Dynamic allocation of executors requires the 
external " +
-          "shuffle service. You may enable this through 
spark.shuffle.service.enabled.")
+        throw new SparkException("Dynamic allocation of executors requires one 
of the " +
+          "following conditions: 1) enabling external shuffle service through 
" +
+          s"${config.SHUFFLE_SERVICE_ENABLED.key}. 2) enabling shuffle 
tracking through " +
+          s"${DYN_ALLOCATION_SHUFFLE_TRACKING_ENABLED.key}. 3) enabling 
shuffle blocks " +
+          s"decommission through ${DECOMMISSION_ENABLED.key} and " +
+          s"${STORAGE_DECOMMISSION_SHUFFLE_BLOCKS_ENABLED.key}. 4) 
(Experimental) " +
+          s"configuring ${SHUFFLE_IO_PLUGIN_CLASS.key} to use a custom 
ShuffleDataIO who's " +
+          "ShuffleDriverComponents supports reliable storage.")
       }
     }
 
diff --git a/docs/configuration.md b/docs/configuration.md
index a70c049c87c..dfded480c99 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -3020,8 +3020,11 @@ Apart from these, the following properties are also 
available, and may be useful
     For more detail, see the description
     <a href="job-scheduling.html#dynamic-resource-allocation">here</a>.
     <br><br>
-    This requires <code>spark.shuffle.service.enabled</code> or
-    <code>spark.dynamicAllocation.shuffleTracking.enabled</code> to be set.
+    This requires one of the following conditions: 
+    1) enabling external shuffle service through 
<code>spark.shuffle.service.enabled</code>, or
+    2) enabling shuffle tracking through 
<code>spark.dynamicAllocation.shuffleTracking.enabled</code>, or
+    3) enabling shuffle blocks decommission through 
<code>spark.decommission.enabled</code> and 
<code>spark.storage.decommission.shuffleBlocks.enabled</code>, or
+    4) (Experimental) configuring 
<code>spark.shuffle.sort.io.plugin.class</code> to use a custom 
<code>ShuffleDataIO</code> who's <code>ShuffleDriverComponents</code> supports 
reliable storage.
     The following configurations are also relevant:
     <code>spark.dynamicAllocation.minExecutors</code>,
     <code>spark.dynamicAllocation.maxExecutors</code>, and
diff --git a/docs/job-scheduling.md b/docs/job-scheduling.md
index 8694ee82e1b..0875bd5558e 100644
--- a/docs/job-scheduling.md
+++ b/docs/job-scheduling.md
@@ -89,11 +89,15 @@ This feature is disabled by default and available on all 
coarse-grained cluster
 
 ### Configuration and Setup
 
-There are two ways for using this feature.
-First, your application must set both `spark.dynamicAllocation.enabled` and 
`spark.dynamicAllocation.shuffleTracking.enabled` to `true`.
-Second, your application must set both `spark.dynamicAllocation.enabled` and 
`spark.shuffle.service.enabled` to `true`
-after you set up an *external shuffle service* on each worker node in the same 
cluster.
-The purpose of the shuffle tracking or the external shuffle service is to 
allow executors to be removed
+There are several ways for using this feature.
+Regardless of which approach you choose, your application must set 
`spark.dynamicAllocation.enabled` to `true` first, additionally, 
+
+- your application must set `spark.shuffle.service.enabled` to `true` after 
you set up an *external shuffle service* on each worker node in the same 
cluster, or
+- your application must set `spark.dynamicAllocation.shuffleTracking.enabled` 
to `true`, or
+- your application must set both `spark.decommission.enabled` and 
`spark.storage.decommission.shuffleBlocks.enabled` to `true`, or
+- your application must configure `spark.shuffle.sort.io.plugin.class` to use 
a custom `ShuffleDataIO` who's `ShuffleDriverComponents` supports reliable 
storage.
+
+The purpose of the external shuffle service or the shuffle tracking or the 
`ShuffleDriverComponents` supports reliable storage is to allow executors to be 
removed
 without deleting shuffle files written by them (more detail described
 [below](job-scheduling.html#graceful-decommission-of-executors)). While it is 
simple to enable shuffle tracking, the way to set up the external shuffle 
service varies across cluster managers:
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44727][CORE][DOCS] Improve docs and error message for dynamic allocation conditions

Reply via email to