This is an automated email from the ASF dual-hosted git repository.
yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new e3caa473776 [SPARK-44727][CORE][DOCS] Improve docs and error message
for dynamic allocation conditions
e3caa473776 is described below
commit e3caa4737768241d91127f019f4a8f4043fa466a
Author: Cheng Pan <[email protected]>
AuthorDate: Fri Aug 11 15:08:49 2023 +0800
[SPARK-44727][CORE][DOCS] Improve docs and error message for dynamic
allocation conditions
### What changes were proposed in this pull request?
Clarify the DRA enabling conditions in docs and error message. (no \n
actually)
```
Dynamic allocation of executors requires one of the following conditions:
1) enabling external shuffle service through spark.shuffle.service.enabled.
2) enabling shuffle tracking through
spark.dynamicAllocation.shuffleTracking.enabled.
3) enabling shuffle blocks decommission through spark.decommission.enabled
and spark.storage.decommission.shuffleBlocks.enabled.
4) (Experimental) configuring spark.shuffle.sort.io.plugin.class to use a
custom ShuffleDataIO who's ShuffleDriverComponents supports reliable storage.
```
### Why are the changes needed?
Currently, ESS is not the only way to support DRA, but users always see the
misleading error message when it does not meet the DRA conditions
```
Dynamic allocation of executors requires the external shuffle service. You
may enable this
through spark.shuffle.service.enabled.
```
This is misleading, especially for users who want to enable DRA in Spark on
K8s cases.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Review.
Closes #42404 from pan3793/SPARK-44727.
Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
---
.../scala/org/apache/spark/ExecutorAllocationManager.scala | 10 ++++++++--
docs/configuration.md | 7 +++++--
docs/job-scheduling.md | 14 +++++++++-----
3 files changed, 22 insertions(+), 9 deletions(-)
diff --git
a/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
index 187125a66c9..441bf60e489 100644
--- a/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
+++ b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
@@ -211,8 +211,14 @@ private[spark] class ExecutorAllocationManager(
conf.get(config.STORAGE_DECOMMISSION_SHUFFLE_BLOCKS_ENABLED)) {
logInfo("Shuffle data decommission is enabled without a shuffle
service.")
} else if (!testing) {
- throw new SparkException("Dynamic allocation of executors requires the
external " +
- "shuffle service. You may enable this through
spark.shuffle.service.enabled.")
+ throw new SparkException("Dynamic allocation of executors requires one
of the " +
+ "following conditions: 1) enabling external shuffle service through
" +
+ s"${config.SHUFFLE_SERVICE_ENABLED.key}. 2) enabling shuffle
tracking through " +
+ s"${DYN_ALLOCATION_SHUFFLE_TRACKING_ENABLED.key}. 3) enabling
shuffle blocks " +
+ s"decommission through ${DECOMMISSION_ENABLED.key} and " +
+ s"${STORAGE_DECOMMISSION_SHUFFLE_BLOCKS_ENABLED.key}. 4)
(Experimental) " +
+ s"configuring ${SHUFFLE_IO_PLUGIN_CLASS.key} to use a custom
ShuffleDataIO who's " +
+ "ShuffleDriverComponents supports reliable storage.")
}
}
diff --git a/docs/configuration.md b/docs/configuration.md
index a70c049c87c..dfded480c99 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -3020,8 +3020,11 @@ Apart from these, the following properties are also
available, and may be useful
For more detail, see the description
<a href="job-scheduling.html#dynamic-resource-allocation">here</a>.
<br><br>
- This requires <code>spark.shuffle.service.enabled</code> or
- <code>spark.dynamicAllocation.shuffleTracking.enabled</code> to be set.
+ This requires one of the following conditions:
+ 1) enabling external shuffle service through
<code>spark.shuffle.service.enabled</code>, or
+ 2) enabling shuffle tracking through
<code>spark.dynamicAllocation.shuffleTracking.enabled</code>, or
+ 3) enabling shuffle blocks decommission through
<code>spark.decommission.enabled</code> and
<code>spark.storage.decommission.shuffleBlocks.enabled</code>, or
+ 4) (Experimental) configuring
<code>spark.shuffle.sort.io.plugin.class</code> to use a custom
<code>ShuffleDataIO</code> who's <code>ShuffleDriverComponents</code> supports
reliable storage.
The following configurations are also relevant:
<code>spark.dynamicAllocation.minExecutors</code>,
<code>spark.dynamicAllocation.maxExecutors</code>, and
diff --git a/docs/job-scheduling.md b/docs/job-scheduling.md
index 8694ee82e1b..0875bd5558e 100644
--- a/docs/job-scheduling.md
+++ b/docs/job-scheduling.md
@@ -89,11 +89,15 @@ This feature is disabled by default and available on all
coarse-grained cluster
### Configuration and Setup
-There are two ways for using this feature.
-First, your application must set both `spark.dynamicAllocation.enabled` and
`spark.dynamicAllocation.shuffleTracking.enabled` to `true`.
-Second, your application must set both `spark.dynamicAllocation.enabled` and
`spark.shuffle.service.enabled` to `true`
-after you set up an *external shuffle service* on each worker node in the same
cluster.
-The purpose of the shuffle tracking or the external shuffle service is to
allow executors to be removed
+There are several ways for using this feature.
+Regardless of which approach you choose, your application must set
`spark.dynamicAllocation.enabled` to `true` first, additionally,
+
+- your application must set `spark.shuffle.service.enabled` to `true` after
you set up an *external shuffle service* on each worker node in the same
cluster, or
+- your application must set `spark.dynamicAllocation.shuffleTracking.enabled`
to `true`, or
+- your application must set both `spark.decommission.enabled` and
`spark.storage.decommission.shuffleBlocks.enabled` to `true`, or
+- your application must configure `spark.shuffle.sort.io.plugin.class` to use
a custom `ShuffleDataIO` who's `ShuffleDriverComponents` supports reliable
storage.
+
+The purpose of the external shuffle service or the shuffle tracking or the
`ShuffleDriverComponents` supports reliable storage is to allow executors to be
removed
without deleting shuffle files written by them (more detail described
[below](job-scheduling.html#graceful-decommission-of-executors)). While it is
simple to enable shuffle tracking, the way to set up the external shuffle
service varies across cluster managers:
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]