This is an automated email from the ASF dual-hosted git repository.

nicholasjiang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/celeborn.git


The following commit(s) were added to refs/heads/main by this push:
     new fe86768ea [CELEBORN-2194] Change default value of 
celeborn.worker.directMemoryRatioForReadBuffer
fe86768ea is described below

commit fe86768ea9d736a11d696950db074480a09db0a4
Author: SteNicholas <[email protected]>
AuthorDate: Tue Nov 4 15:23:06 2025 +0800

    [CELEBORN-2194] Change default value of 
celeborn.worker.directMemoryRatioForReadBuffer
    
    ### What changes were proposed in this pull request?
    
    Change default value of `celeborn.worker.directMemoryRatioForReadBuffer` 
from 0.1 to 0.35.
    
    ### Why are the changes needed?
    
    The default value of `celeborn.worker.directMemoryRatioForReadBuffer` is 
0.1, which is too small to cause a backlog of read buffer requests in 
`ReadBufferDispacther`. Therefore, 
`celeborn.worker.directMemoryRatioForReadBuffer` should be changed from `0.1` 
to `0.35` which is production practice value to raise read buffer threshold of 
`ReadBufferDispatcher`.
    
    ### Does this PR resolve a correctness bug?
    
    No.
    
    ### Does this PR introduce _any_ user-facing change?
    
    The default value of `celeborn.worker.directMemoryRatioForReadBuffer` is 
changed to 0.35.
    
    ### How was this patch tested?
    
    CI.
    
    Closes #3527 from SteNicholas/CELEBORN-2194.
    
    Authored-by: SteNicholas <[email protected]>
    Signed-off-by: 子懿 <[email protected]>
---
 common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala | 2 +-
 docs/configuration/worker.md                                        | 2 +-
 docs/migration.md                                                   | 2 ++
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git 
a/common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala 
b/common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala
index 010cd57a4..a39cf5797 100644
--- a/common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala
+++ b/common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala
@@ -4153,7 +4153,7 @@ object CelebornConf extends Logging {
       .doc("Max ratio of direct memory for read buffer")
       .version("0.2.0")
       .doubleConf
-      .createWithDefault(0.1)
+      .createWithDefault(0.35)
 
   val WORKER_DIRECT_MEMORY_RATIO_FOR_MEMORY_FILE_STORAGE: ConfigEntry[Double] =
     buildConf("celeborn.worker.directMemoryRatioForMemoryFileStorage")
diff --git a/docs/configuration/worker.md b/docs/configuration/worker.md
index f4f4744d4..66468c749 100644
--- a/docs/configuration/worker.md
+++ b/docs/configuration/worker.md
@@ -77,7 +77,7 @@ license: |
 | celeborn.worker.decommission.checkInterval | 30s | false | The wait interval 
of checking whether all the shuffle expired during worker decommission | 0.4.0 
|  | 
 | celeborn.worker.decommission.forceExitTimeout | 6h | false | The wait time 
of waiting for all the shuffle expire during worker decommission. | 0.4.0 |  | 
 | celeborn.worker.directMemoryRatioForMemoryFileStorage | 0.0 | false | Max 
ratio of direct memory to store shuffle data. This feature is experimental and 
disabled by default. | 0.5.0 |  | 
-| celeborn.worker.directMemoryRatioForReadBuffer | 0.1 | false | Max ratio of 
direct memory for read buffer | 0.2.0 |  | 
+| celeborn.worker.directMemoryRatioForReadBuffer | 0.35 | false | Max ratio of 
direct memory for read buffer | 0.2.0 |  | 
 | celeborn.worker.directMemoryRatioToPauseReceive | 0.85 | false | If direct 
memory usage reaches this limit, the worker will stop to receive data from 
Celeborn shuffle clients. | 0.2.0 |  | 
 | celeborn.worker.directMemoryRatioToPauseReplicate | 0.95 | false | If direct 
memory usage reaches this limit, the worker will stop to receive replication 
data from other workers. This value should be higher than 
celeborn.worker.directMemoryRatioToPauseReceive. | 0.2.0 |  | 
 | celeborn.worker.directMemoryRatioToResume | 0.7 | false | If direct memory 
usage is less than this limit, worker will resume. | 0.2.0 |  | 
diff --git a/docs/migration.md b/docs/migration.md
index 2c302a534..7e0385f36 100644
--- a/docs/migration.md
+++ b/docs/migration.md
@@ -31,6 +31,8 @@ license: |
 
 - Since 0.7.0, Celeborn changed the default value of 
`celeborn.<module>.io.mode` from `NIO` to `KQUEUE` if kqueue mode is available, 
falling back to `NIO` otherwise.
 
+- Since 0.7.0, Celeborn changed the default value of 
`celeborn.worker.directMemoryRatioForReadBuffer` from `0.1` to `0.35`, which 
means read buffer threshold of buffer dispatcher is max direct memory * 0.35 at 
default.
+
 # Upgrading from 0.5 to 0.6
 
 - Since 0.6.0, Celeborn deprecate 
`celeborn.client.spark.fetch.throwsFetchFailure`. Please use 
`celeborn.client.spark.stageRerun.enabled` instead.

Reply via email to