This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 338bb31c2fac [SPARK-46997][CORE] Enable `spark.worker.cleanup.enabled`
by default
338bb31c2fac is described below
commit 338bb31c2fac79fbc3482c23310b77d5306bd6c8
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Wed Feb 7 22:51:14 2024 -0800
[SPARK-46997][CORE] Enable `spark.worker.cleanup.enabled` by default
### What changes were proposed in this pull request?
This PR aims to enable `spark.worker.cleanup.enabled` by default as a part
of Apache Spark 4.0.0.
### Why are the changes needed?
Apache Spark community has been recommending (from Apache Spark 3.0 to 3.5)
to enable `spark.worker.cleanup.enabled` when
`spark.shuffle.service.db.enabled` is true. And,
`spark.shuffle.service.db.enabled` has been `true` since SPARK-26288.
https://github.com/apache/spark/blob/dc73a8d7e96ead55053096971c838908b7c90527/docs/spark-standalone.md?plain=1#L443
https://github.com/apache/spark/blob/dc73a8d7e96ead55053096971c838908b7c90527/docs/spark-standalone.md?plain=1#L473
https://github.com/apache/spark/blob/dc73a8d7e96ead55053096971c838908b7c90527/core/src/main/scala/org/apache/spark/internal/config/package.scala#L718-L724
Although `spark.shuffle.service.enabled` is disabled by default,
`spark.worker.cleanup.enabled` is crucial for long-standing Spark Standalone
clusters to avoid the disk full situation.
https://github.com/apache/spark/blob/dc73a8d7e96ead55053096971c838908b7c90527/core/src/main/scala/org/apache/spark/internal/config/package.scala#L692-L696
### Does this PR introduce _any_ user-facing change?
Yes, but this has been a long-standing recommended configuration in the
real production-level Spark Standalone clusters.
### How was this patch tested?
Pass the CIs.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #45055 from dongjoon-hyun/SPARK-46997.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
core/src/main/scala/org/apache/spark/internal/config/Worker.scala | 2 +-
docs/core-migration-guide.md | 2 ++
docs/spark-standalone.md | 2 +-
3 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/core/src/main/scala/org/apache/spark/internal/config/Worker.scala
b/core/src/main/scala/org/apache/spark/internal/config/Worker.scala
index c53e181df002..5a67f3398a7d 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/Worker.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/Worker.scala
@@ -62,7 +62,7 @@ private[spark] object Worker {
val WORKER_CLEANUP_ENABLED = ConfigBuilder("spark.worker.cleanup.enabled")
.version("1.0.0")
.booleanConf
- .createWithDefault(false)
+ .createWithDefault(true)
val WORKER_CLEANUP_INTERVAL = ConfigBuilder("spark.worker.cleanup.interval")
.version("1.0.0")
diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md
index 7a5b17397bec..26e6b0f1f444 100644
--- a/docs/core-migration-guide.md
+++ b/docs/core-migration-guide.md
@@ -28,6 +28,8 @@ license: |
- Since Spark 4.0, Spark will compress event logs. To restore the behavior
before Spark 4.0, you can set `spark.eventLog.compress` to `false`.
+- Since Spark 4.0, Spark workers will clean up worker and stopped application
directories periodically. To restore the behavior before Spark 4.0, you can set
`spark.worker.cleanup.enabled` to `false`.
+
- Since Spark 4.0, `spark.shuffle.service.db.backend` is set to `ROCKSDB` by
default which means Spark will use RocksDB store for shuffle service. To
restore the behavior before Spark 4.0, you can set
`spark.shuffle.service.db.backend` to `LEVELDB`.
- In Spark 4.0, support for Apache Mesos as a resource manager was removed.
diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md
index fbc83180d6b6..1eab3158e2e5 100644
--- a/docs/spark-standalone.md
+++ b/docs/spark-standalone.md
@@ -436,7 +436,7 @@ SPARK_WORKER_OPTS supports the following system properties:
</tr>
<tr>
<td><code>spark.worker.cleanup.enabled</code></td>
- <td>false</td>
+ <td>true</td>
<td>
Enable periodic cleanup of worker / application directories. Note that
this only affects standalone
mode, as YARN works differently. Only the directories of stopped
applications are cleaned up.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]