This is an automated email from the ASF dual-hosted git repository.
zhouky pushed a commit to branch branch-0.3
in repository https://gitbox.apache.org/repos/asf/incubator-celeborn.git
The following commit(s) were added to refs/heads/branch-0.3 by this push:
new c375895ef [CELEBORN-979] Reduce default disk Check Interval
c375895ef is described below
commit c375895ef370db80a8da80d992010c55ab341275
Author: jiaoqingbo <[email protected]>
AuthorDate: Mon Sep 18 14:54:22 2023 +0800
[CELEBORN-979] Reduce default disk Check Interval
Reduce default disk Check Interval
since https://github.com/apache/incubator-celeborn/pull/1909 ,In
PushDataHandler#checkDiskFull method,Added check logic for DiskInfo status, the
default disk Check Interval should be reduced
NO
PASS GA
Closes #1915 from jiaoqingbo/979.
Authored-by: jiaoqingbo <[email protected]>
Signed-off-by: zky.zhoukeyong <[email protected]>
(cherry picked from commit 107f3df8ba79036ccaec54d0e095d69643570d02)
Signed-off-by: zky.zhoukeyong <[email protected]>
---
.../main/scala/org/apache/celeborn/common/CelebornConf.scala | 2 +-
docs/configuration/worker.md | 2 +-
docs/migration.md | 10 +++++++++-
3 files changed, 11 insertions(+), 3 deletions(-)
diff --git
a/common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala
b/common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala
index d7edccc2f..eb4787d46 100644
--- a/common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala
+++ b/common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala
@@ -2405,7 +2405,7 @@ object CelebornConf extends Logging {
.version("0.3.0")
.doc("Intervals between device monitor to check disk.")
.timeConf(TimeUnit.MILLISECONDS)
- .createWithDefaultString("60s")
+ .createWithDefaultString("30s")
val WORKER_DISK_MONITOR_SYS_BLOCK_DIR: ConfigEntry[String] =
buildConf("celeborn.worker.monitor.disk.sys.block.dir")
diff --git a/docs/configuration/worker.md b/docs/configuration/worker.md
index 32f243034..123a766e4 100644
--- a/docs/configuration/worker.md
+++ b/docs/configuration/worker.md
@@ -60,7 +60,7 @@ license: |
| celeborn.worker.graceful.shutdown.saveCommittedFileInfo.interval | 5s |
Interval for a Celeborn worker to flush committed file infos into Level DB. |
0.3.1 |
| celeborn.worker.graceful.shutdown.saveCommittedFileInfo.sync | false |
Whether to call sync method to save committed file infos into Level DB to
handle OS crash. | 0.3.1 |
| celeborn.worker.graceful.shutdown.timeout | 600s | The worker's graceful
shutdown timeout time. | 0.2.0 |
-| celeborn.worker.monitor.disk.check.interval | 60s | Intervals between device
monitor to check disk. | 0.3.0 |
+| celeborn.worker.monitor.disk.check.interval | 30s | Intervals between device
monitor to check disk. | 0.3.0 |
| celeborn.worker.monitor.disk.check.timeout | 30s | Timeout time for worker
check device status. | 0.3.0 |
| celeborn.worker.monitor.disk.checklist | readwrite,diskusage | Monitor type
for disk, available items are: iohang, readwrite and diskusage. | 0.2.0 |
| celeborn.worker.monitor.disk.enabled | true | When true, worker will monitor
device and report to master. | 0.3.0 |
diff --git a/docs/migration.md b/docs/migration.md
index 2001bef1f..328df89b6 100644
--- a/docs/migration.md
+++ b/docs/migration.md
@@ -19,7 +19,15 @@ license: |
# Migration Guide
-## Upgrading from 0.2 to 0.3
+## Upgrading from 0.3.1 to 0.3.2
+
+- Since 0.3.2, Celeborn changed the default value of
`celeborn.worker.monitor.disk.check.interval` from `60` to `30`.
+
+## Upgrading from 0.3.0 to 0.3.1
+
+- Since 0.3.1, Celeborn changed the default value of
`celeborn.worker.directMemoryRatioToResume` from `0.5` to `0.7`.
+
+## Upgrading from 0.2 to 0.3.0
- Celeborn 0.2 Client is compatible with 0.3 Master/Server, it allows to
upgrade Master/Worker first then Client.
Note that: It's strongly recommended to use the same version of Client and
Celeborn Master/Worker in production.