This is an automated email from the ASF dual-hosted git repository.
nicholasjiang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/celeborn.git
The following commit(s) were added to refs/heads/main by this push:
new fe86768ea [CELEBORN-2194] Change default value of
celeborn.worker.directMemoryRatioForReadBuffer
fe86768ea is described below
commit fe86768ea9d736a11d696950db074480a09db0a4
Author: SteNicholas <[email protected]>
AuthorDate: Tue Nov 4 15:23:06 2025 +0800
[CELEBORN-2194] Change default value of
celeborn.worker.directMemoryRatioForReadBuffer
### What changes were proposed in this pull request?
Change default value of `celeborn.worker.directMemoryRatioForReadBuffer`
from 0.1 to 0.35.
### Why are the changes needed?
The default value of `celeborn.worker.directMemoryRatioForReadBuffer` is
0.1, which is too small to cause a backlog of read buffer requests in
`ReadBufferDispacther`. Therefore,
`celeborn.worker.directMemoryRatioForReadBuffer` should be changed from `0.1`
to `0.35` which is production practice value to raise read buffer threshold of
`ReadBufferDispatcher`.
### Does this PR resolve a correctness bug?
No.
### Does this PR introduce _any_ user-facing change?
The default value of `celeborn.worker.directMemoryRatioForReadBuffer` is
changed to 0.35.
### How was this patch tested?
CI.
Closes #3527 from SteNicholas/CELEBORN-2194.
Authored-by: SteNicholas <[email protected]>
Signed-off-by: 子懿 <[email protected]>
---
common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala | 2 +-
docs/configuration/worker.md | 2 +-
docs/migration.md | 2 ++
3 files changed, 4 insertions(+), 2 deletions(-)
diff --git
a/common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala
b/common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala
index 010cd57a4..a39cf5797 100644
--- a/common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala
+++ b/common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala
@@ -4153,7 +4153,7 @@ object CelebornConf extends Logging {
.doc("Max ratio of direct memory for read buffer")
.version("0.2.0")
.doubleConf
- .createWithDefault(0.1)
+ .createWithDefault(0.35)
val WORKER_DIRECT_MEMORY_RATIO_FOR_MEMORY_FILE_STORAGE: ConfigEntry[Double] =
buildConf("celeborn.worker.directMemoryRatioForMemoryFileStorage")
diff --git a/docs/configuration/worker.md b/docs/configuration/worker.md
index f4f4744d4..66468c749 100644
--- a/docs/configuration/worker.md
+++ b/docs/configuration/worker.md
@@ -77,7 +77,7 @@ license: |
| celeborn.worker.decommission.checkInterval | 30s | false | The wait interval
of checking whether all the shuffle expired during worker decommission | 0.4.0
| |
| celeborn.worker.decommission.forceExitTimeout | 6h | false | The wait time
of waiting for all the shuffle expire during worker decommission. | 0.4.0 | |
| celeborn.worker.directMemoryRatioForMemoryFileStorage | 0.0 | false | Max
ratio of direct memory to store shuffle data. This feature is experimental and
disabled by default. | 0.5.0 | |
-| celeborn.worker.directMemoryRatioForReadBuffer | 0.1 | false | Max ratio of
direct memory for read buffer | 0.2.0 | |
+| celeborn.worker.directMemoryRatioForReadBuffer | 0.35 | false | Max ratio of
direct memory for read buffer | 0.2.0 | |
| celeborn.worker.directMemoryRatioToPauseReceive | 0.85 | false | If direct
memory usage reaches this limit, the worker will stop to receive data from
Celeborn shuffle clients. | 0.2.0 | |
| celeborn.worker.directMemoryRatioToPauseReplicate | 0.95 | false | If direct
memory usage reaches this limit, the worker will stop to receive replication
data from other workers. This value should be higher than
celeborn.worker.directMemoryRatioToPauseReceive. | 0.2.0 | |
| celeborn.worker.directMemoryRatioToResume | 0.7 | false | If direct memory
usage is less than this limit, worker will resume. | 0.2.0 | |
diff --git a/docs/migration.md b/docs/migration.md
index 2c302a534..7e0385f36 100644
--- a/docs/migration.md
+++ b/docs/migration.md
@@ -31,6 +31,8 @@ license: |
- Since 0.7.0, Celeborn changed the default value of
`celeborn.<module>.io.mode` from `NIO` to `KQUEUE` if kqueue mode is available,
falling back to `NIO` otherwise.
+- Since 0.7.0, Celeborn changed the default value of
`celeborn.worker.directMemoryRatioForReadBuffer` from `0.1` to `0.35`, which
means read buffer threshold of buffer dispatcher is max direct memory * 0.35 at
default.
+
# Upgrading from 0.5 to 0.6
- Since 0.6.0, Celeborn deprecate
`celeborn.client.spark.fetch.throwsFetchFailure`. Please use
`celeborn.client.spark.stageRerun.enabled` instead.