[spark] branch master updated: [SPARK-39846][CORE] Enable `spark.dynamicAllocation.shuffleTracking.enabled` by default

dongjoon Sat, 23 Jul 2022 15:48:20 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 1b6cdf10406 [SPARK-39846][CORE] Enable 
`spark.dynamicAllocation.shuffleTracking.enabled` by default
1b6cdf10406 is described below

commit 1b6cdf1040645486ae9b5cbb0247d8869f4f259f
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Sat Jul 23 15:48:01 2022 -0700

    [SPARK-39846][CORE] Enable 
`spark.dynamicAllocation.shuffleTracking.enabled` by default
    
    ### What changes were proposed in this pull request?
    
    This PR aims to enable `spark.dynamicAllocation.shuffleTracking.enabled` by 
default in Apache Spark 3.4 when `spark.dynamicAllocation.enabled=true` and 
`spark.shuffle.service.enabled=false`
    
    ### Why are the changes needed?
    
    Here is a brief history around 
`spark.dynamicAllocation.shuffleTracking.enabled`.
    - Apache Spark 3.0.0 added it via SPARK-27963 for K8s environment.
      > One immediate use case is the ability to use dynamic allocation on 
Kubernetes, which doesn't yet have that service.
    - Apache Spark 3.1.1 made K8s GA via SPARK-33005 and started to used it in 
K8s widely.
    - Apache Spark 3.2.0 started to support shuffle data recovery on the reused 
PVCs via SPARK-35593
    - Apache Spark 3.3.0 removed `Experimental` tag from it via SPARK-39322.
    - Apache Spark 3.4.0 will enable it by default via SPARK-39846 (this PR) to 
help Spark K8s users to dynamic allocation more easily.
    
    ### Does this PR introduce _any_ user-facing change?
    
    The `Core` migration guide is updated.
    
    ### How was this patch tested?
    
    Pass the CIs including K8s IT GitHub Action job.
    
    Closes #37257 from dongjoon-hyun/SPARK-39846.
    
    Authored-by: Dongjoon Hyun <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +-
 docs/configuration.md                                              | 2 +-
 docs/core-migration-guide.md                                       | 2 ++
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index 02a52e86454..72a03a4d1fb 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -646,7 +646,7 @@ package object config {
     ConfigBuilder("spark.dynamicAllocation.shuffleTracking.enabled")
       .version("3.0.0")
       .booleanConf
-      .createWithDefault(false)
+      .createWithDefault(true)
 
   private[spark] val DYN_ALLOCATION_SHUFFLE_TRACKING_TIMEOUT =
     ConfigBuilder("spark.dynamicAllocation.shuffleTracking.timeout")
diff --git a/docs/configuration.md b/docs/configuration.md
index 26addffe88b..957c430c37b 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -2760,7 +2760,7 @@ Apart from these, the following properties are also 
available, and may be useful
 </tr>
 <tr>
   <td><code>spark.dynamicAllocation.shuffleTracking.enabled</code></td>
-  <td><code>false</code></td>
+  <td><code>true</code></td>
   <td>
     Enables shuffle file tracking for executors, which allows dynamic 
allocation
     without the need for an external shuffle service. This option will try to 
keep alive executors
diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md
index 1a16b8f112a..a4af47b016a 100644
--- a/docs/core-migration-guide.md
+++ b/docs/core-migration-guide.md
@@ -26,6 +26,8 @@ license: |
 
 - Since Spark 3.4, Spark driver will own `PersistentVolumnClaim`s and try to 
reuse if they are not assigned to live executors. To restore the behavior 
before Spark 3.4, you can set 
`spark.kubernetes.driver.ownPersistentVolumeClaim` to `false` and 
`spark.kubernetes.driver.reusePersistentVolumeClaim` to `false`.
 
+- Since Spark 3.4, Spark driver will track shuffle data when dynamic 
allocation is enabled without shuffle service. To restore the behavior before 
Spark 3.4, you can set `spark.dynamicAllocation.shuffleTracking.enabled` to 
`false`.
+
 ## Upgrading from Core 3.2 to 3.3
 
 - Since Spark 3.3, Spark migrates its log4j dependency from 1.x to 2.x because 
log4j 1.x has reached end of life and is no longer supported by the community. 
Vulnerabilities reported after August 2015 against log4j 1.x were not checked 
and will not be fixed. Users should rewrite original log4j properties files 
using log4j2 syntax (XML, JSON, YAML, or properties format). Spark rewrites the 
`conf/log4j.properties.template` which is included in Spark distribution, to 
`conf/log4j2.properties [...]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch master updated: [SPARK-39846][CORE] Enable `spark.dynamicAllocation.shuffleTracking.enabled` by default

Reply via email to