This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new e3caa473776 [SPARK-44727][CORE][DOCS] Improve docs and error message for dynamic allocation conditions e3caa473776 is described below commit e3caa4737768241d91127f019f4a8f4043fa466a Author: Cheng Pan <cheng...@apache.org> AuthorDate: Fri Aug 11 15:08:49 2023 +0800 [SPARK-44727][CORE][DOCS] Improve docs and error message for dynamic allocation conditions ### What changes were proposed in this pull request? Clarify the DRA enabling conditions in docs and error message. (no \n actually) ``` Dynamic allocation of executors requires one of the following conditions: 1) enabling external shuffle service through spark.shuffle.service.enabled. 2) enabling shuffle tracking through spark.dynamicAllocation.shuffleTracking.enabled. 3) enabling shuffle blocks decommission through spark.decommission.enabled and spark.storage.decommission.shuffleBlocks.enabled. 4) (Experimental) configuring spark.shuffle.sort.io.plugin.class to use a custom ShuffleDataIO who's ShuffleDriverComponents supports reliable storage. ``` ### Why are the changes needed? Currently, ESS is not the only way to support DRA, but users always see the misleading error message when it does not meet the DRA conditions ``` Dynamic allocation of executors requires the external shuffle service. You may enable this through spark.shuffle.service.enabled. ``` This is misleading, especially for users who want to enable DRA in Spark on K8s cases. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Review. Closes #42404 from pan3793/SPARK-44727. Authored-by: Cheng Pan <cheng...@apache.org> Signed-off-by: Kent Yao <y...@apache.org> --- .../scala/org/apache/spark/ExecutorAllocationManager.scala | 10 ++++++++-- docs/configuration.md | 7 +++++-- docs/job-scheduling.md | 14 +++++++++----- 3 files changed, 22 insertions(+), 9 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala index 187125a66c9..441bf60e489 100644 --- a/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala +++ b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala @@ -211,8 +211,14 @@ private[spark] class ExecutorAllocationManager( conf.get(config.STORAGE_DECOMMISSION_SHUFFLE_BLOCKS_ENABLED)) { logInfo("Shuffle data decommission is enabled without a shuffle service.") } else if (!testing) { - throw new SparkException("Dynamic allocation of executors requires the external " + - "shuffle service. You may enable this through spark.shuffle.service.enabled.") + throw new SparkException("Dynamic allocation of executors requires one of the " + + "following conditions: 1) enabling external shuffle service through " + + s"${config.SHUFFLE_SERVICE_ENABLED.key}. 2) enabling shuffle tracking through " + + s"${DYN_ALLOCATION_SHUFFLE_TRACKING_ENABLED.key}. 3) enabling shuffle blocks " + + s"decommission through ${DECOMMISSION_ENABLED.key} and " + + s"${STORAGE_DECOMMISSION_SHUFFLE_BLOCKS_ENABLED.key}. 4) (Experimental) " + + s"configuring ${SHUFFLE_IO_PLUGIN_CLASS.key} to use a custom ShuffleDataIO who's " + + "ShuffleDriverComponents supports reliable storage.") } } diff --git a/docs/configuration.md b/docs/configuration.md index a70c049c87c..dfded480c99 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -3020,8 +3020,11 @@ Apart from these, the following properties are also available, and may be useful For more detail, see the description <a href="job-scheduling.html#dynamic-resource-allocation">here</a>. <br><br> - This requires <code>spark.shuffle.service.enabled</code> or - <code>spark.dynamicAllocation.shuffleTracking.enabled</code> to be set. + This requires one of the following conditions: + 1) enabling external shuffle service through <code>spark.shuffle.service.enabled</code>, or + 2) enabling shuffle tracking through <code>spark.dynamicAllocation.shuffleTracking.enabled</code>, or + 3) enabling shuffle blocks decommission through <code>spark.decommission.enabled</code> and <code>spark.storage.decommission.shuffleBlocks.enabled</code>, or + 4) (Experimental) configuring <code>spark.shuffle.sort.io.plugin.class</code> to use a custom <code>ShuffleDataIO</code> who's <code>ShuffleDriverComponents</code> supports reliable storage. The following configurations are also relevant: <code>spark.dynamicAllocation.minExecutors</code>, <code>spark.dynamicAllocation.maxExecutors</code>, and diff --git a/docs/job-scheduling.md b/docs/job-scheduling.md index 8694ee82e1b..0875bd5558e 100644 --- a/docs/job-scheduling.md +++ b/docs/job-scheduling.md @@ -89,11 +89,15 @@ This feature is disabled by default and available on all coarse-grained cluster ### Configuration and Setup -There are two ways for using this feature. -First, your application must set both `spark.dynamicAllocation.enabled` and `spark.dynamicAllocation.shuffleTracking.enabled` to `true`. -Second, your application must set both `spark.dynamicAllocation.enabled` and `spark.shuffle.service.enabled` to `true` -after you set up an *external shuffle service* on each worker node in the same cluster. -The purpose of the shuffle tracking or the external shuffle service is to allow executors to be removed +There are several ways for using this feature. +Regardless of which approach you choose, your application must set `spark.dynamicAllocation.enabled` to `true` first, additionally, + +- your application must set `spark.shuffle.service.enabled` to `true` after you set up an *external shuffle service* on each worker node in the same cluster, or +- your application must set `spark.dynamicAllocation.shuffleTracking.enabled` to `true`, or +- your application must set both `spark.decommission.enabled` and `spark.storage.decommission.shuffleBlocks.enabled` to `true`, or +- your application must configure `spark.shuffle.sort.io.plugin.class` to use a custom `ShuffleDataIO` who's `ShuffleDriverComponents` supports reliable storage. + +The purpose of the external shuffle service or the shuffle tracking or the `ShuffleDriverComponents` supports reliable storage is to allow executors to be removed without deleting shuffle files written by them (more detail described [below](job-scheduling.html#graceful-decommission-of-executors)). While it is simple to enable shuffle tracking, the way to set up the external shuffle service varies across cluster managers: --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org