This is an automated email from the ASF dual-hosted git repository.
nicholasjiang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/celeborn.git
The following commit(s) were added to refs/heads/main by this push:
new 16762c659 [CELEBORN-1774][FOLLOWUP] Change celeborn.<module>.io.mode
optional to explain default behavior in description
16762c659 is described below
commit 16762c659c88837a534f0010415ecaef5b5bdc31
Author: SteNicholas <[email protected]>
AuthorDate: Thu Jan 2 21:15:19 2025 +0800
[CELEBORN-1774][FOLLOWUP] Change celeborn.<module>.io.mode optional to
explain default behavior in description
### What changes were proposed in this pull request?
Change `celeborn.<module>.io.mode` optional to explain default behavior in
description.
### Why are the changes needed?
The default value of `celeborn.<module>.io.mode` in document could be
changed by whether epoll mode is available for different os. Therefore,
`celeborn.<module>.io.mode` should be changed to optional and explained the
default behavior in description of option.
Follow up
https://github.com/apache/celeborn/pull/3039#discussion_r1899340272.
### Does this PR introduce _any_ user-facing change?
`celeborn.<module>.io.mode` is optional and explains default behavior in
description.
### How was this patch tested?
CI.
Closes #3044 from SteNicholas/CELEBORN-1774.
Authored-by: SteNicholas <[email protected]>
Signed-off-by: SteNicholas <[email protected]>
---
.../src/main/scala/org/apache/celeborn/common/CelebornConf.scala | 9 +++++----
docs/configuration/network.md | 2 +-
docs/migration.md | 2 +-
3 files changed, 7 insertions(+), 6 deletions(-)
diff --git
a/common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala
b/common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala
index 7280a1cde..80d31d747 100644
--- a/common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala
+++ b/common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala
@@ -539,7 +539,9 @@ class CelebornConf(loadDefaults: Boolean) extends Cloneable
with Logging with Se
def rpcDumpIntervalMs(): Long = get(RPC_SUMMARY_DUMP_INTERVAL)
def networkIoMode(module: String): String = {
- getTransportConf(module, NETWORK_IO_MODE)
+ get(
+ NETWORK_IO_MODE.key.replace("<module>", module),
+ if (Epoll.isAvailable) IOMode.EPOLL.name() else IOMode.NIO.name())
}
def networkIoPreferDirectBufs(module: String): Boolean = {
@@ -1931,15 +1933,14 @@ object CelebornConf extends Logging {
.timeConf(TimeUnit.MILLISECONDS)
.createWithDefaultString("60s")
- val NETWORK_IO_MODE: ConfigEntry[String] =
+ val NETWORK_IO_MODE: OptionalConfigEntry[String] =
buildConf("celeborn.<module>.io.mode")
.categories("network")
.doc("Netty EventLoopGroup backend, available options: NIO, EPOLL. If
epoll mode is available, the default IO mode is EPOLL; otherwise, the default
is NIO.")
.stringConf
.transform(_.toUpperCase)
.checkValues(Set(IOMode.NIO.name(), IOMode.EPOLL.name()))
- .createWithDefaultFunction(() =>
- if (Epoll.isAvailable) IOMode.EPOLL.name() else IOMode.NIO.name())
+ .createOptional
val NETWORK_IO_PREFER_DIRECT_BUFS: ConfigEntry[Boolean] =
buildConf("celeborn.<module>.io.preferDirectBufs")
diff --git a/docs/configuration/network.md b/docs/configuration/network.md
index f690d205e..4a5f8c943 100644
--- a/docs/configuration/network.md
+++ b/docs/configuration/network.md
@@ -29,7 +29,7 @@ license: |
| celeborn.<module>.io.enableVerboseMetrics | false | false | Whether to
track Netty memory detailed metrics. If true, the detailed metrics of Netty
PoolByteBufAllocator will be gotten, otherwise only general memory usage will
be tracked. | | |
| celeborn.<module>.io.lazyFD | true | false | Whether to initialize
FileDescriptor lazily or not. If true, file descriptors are created only when
data is going to be transferred. This can reduce the number of open files. If
setting <module> to `fetch`, it works for worker fetch server. | | |
| celeborn.<module>.io.maxRetries | 3 | false | Max number of times we
will try IO exceptions (such as connection timeouts) per request. If set to 0,
we will not do any retries. If setting <module> to `data`, it works for shuffle
client push and fetch data. If setting <module> to `replicate`, it works for
replicate client of worker replicating data to peer worker. If setting <module>
to `push`, it works for Flink shuffle client push data. | | |
-| celeborn.<module>.io.mode | EPOLL | false | Netty EventLoopGroup
backend, available options: NIO, EPOLL. If epoll mode is available, the default
IO mode is EPOLL; otherwise, the default is NIO. | | |
+| celeborn.<module>.io.mode | <undefined> | false | Netty
EventLoopGroup backend, available options: NIO, EPOLL. If epoll mode is
available, the default IO mode is EPOLL; otherwise, the default is NIO. | | |
| celeborn.<module>.io.numConnectionsPerPeer | 1 | false | Number of
concurrent connections between two nodes. If setting <module> to `rpc_app`,
works for shuffle client. If setting <module> to `rpc_service`, works for
master or worker. If setting <module> to `data`, it works for shuffle client
push and fetch data. If setting <module> to `replicate`, it works for replicate
client of worker replicating data to peer worker. | | |
| celeborn.<module>.io.preferDirectBufs | true | false | If true, we
will prefer allocating off-heap byte buffers within Netty. If setting <module>
to `rpc_app`, works for shuffle client. If setting <module> to `rpc_service`,
works for master or worker. If setting <module> to `data`, it works for shuffle
client push and fetch data. If setting <module> to `push`, it works for worker
receiving push data. If setting <module> to `replicate`, it works for replicate
server or client of w [...]
| celeborn.<module>.io.receiveBuffer | 0b | false | Receive buffer size
(SO_RCVBUF). Note: the optimal size for receive buffer and send buffer should
be latency * network_bandwidth. Assuming latency = 1ms, network_bandwidth =
10Gbps buffer size should be ~ 1.25MB. If setting <module> to `rpc_app`, works
for shuffle client. If setting <module> to `rpc_service`, works for master or
worker. If setting <module> to `data`, it works for shuffle client push and
fetch data. If setting <mod [...]
diff --git a/docs/migration.md b/docs/migration.md
index e9ceb8c03..800655e5e 100644
--- a/docs/migration.md
+++ b/docs/migration.md
@@ -31,7 +31,7 @@ license: |
- Since 0.6.0, Celeborn changed the default value of
`celeborn.client.spark.fetch.throwsFetchFailure` from `false` to `true`, which
means Celeborn will enable spark stage rerun at default.
-- Since 0.6.0, Celeborn changed the default value of
`celeborn.<module>.io.mode` from `NIO` to `EPOLL` if epoll mode is available,
falling back to `NIO` otherwise.
+- Since 0.6.0, Celeborn changed `celeborn.<module>.io.mode` optional, of which
the default value changed from `NIO` to `EPOLL` if epoll mode is available,
falling back to `NIO` otherwise.
- Since 0.6.0, Celeborn has introduced a new RESTful API namespace: /api/v1,
which uses the application/json media type for requests and responses.
The `celeborn-openapi-client` SDK is also available to help users interact
with the new RESTful APIs.