This is an automated email from the ASF dual-hosted git repository.
roryqi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-uniffle.git
The following commit(s) were added to refs/heads/master by this push:
new ab91f7e7 [#1060] doc: Add the document for Netty. (#1116)
ab91f7e7 is described below
commit ab91f7e7a41a31c49fd0eb9f5830d43cae5d7171
Author: Xianming Lei <[email protected]>
AuthorDate: Tue Aug 8 14:03:28 2023 +0800
[#1060] doc: Add the document for Netty. (#1116)
### What changes were proposed in this pull request?
Add the document for Netty.
### Why are the changes needed?
For: #1060
### Does this PR introduce _any_ user-facing change?
Yes.
Co-authored-by: leixianming <[email protected]>
---
.../uniffle/common/config/RssClientConf.java | 2 +-
docs/client_guide.md | 13 +++++++++++-
docs/server_guide.md | 23 ++++++++++++++++++++++
3 files changed, 36 insertions(+), 2 deletions(-)
diff --git
a/common/src/main/java/org/apache/uniffle/common/config/RssClientConf.java
b/common/src/main/java/org/apache/uniffle/common/config/RssClientConf.java
index 7aab6767..c516f747 100644
--- a/common/src/main/java/org/apache/uniffle/common/config/RssClientConf.java
+++ b/common/src/main/java/org/apache/uniffle/common/config/RssClientConf.java
@@ -90,7 +90,7 @@ public class RssClientConf {
ConfigOptions.key("rss.client.netty.client.connections.per.peer")
.intType()
.defaultValue(2)
- .withDescription("Number of concurrent connections between two
nodes.");
+ .withDescription("Number of concurrent connections between client
and ShuffleServer.");
public static final ConfigOption<Integer> NETTY_CLIENT_RECEIVE_BUFFER =
ConfigOptions.key("rss.client.netty.client.receive.buffer")
diff --git a/docs/client_guide.md b/docs/client_guide.md
index 74415260..2d6c670e 100644
--- a/docs/client_guide.md
+++ b/docs/client_guide.md
@@ -241,4 +241,15 @@ Notice: this feature requires the MEMORY_LOCAL_HADOOP mode.
| Property Name | Default | Description
|
|--------------------------------|---------|-------------------------------------------------------------------------|
| tez.rss.avoid.recompute.succeeded.task | false | Whether to avoid
recompute succeeded task when node is unhealthy or black-listed |
-
\ No newline at end of file
+
+### Netty Setting
+| Property Name | Default | Description
|
+|-----------------------------------------------------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| spark.rss.client.type | GRPC | The default
is GRPC, we can set it to GRPC_NETTY to enable the netty on the client
|
+| spark.rss.client.netty.io.mode | NIO | Netty
EventLoopGroup backend, available options: NIO, EPOLL.
|
+| spark.rss.client.netty.client.connection.timeout.ms | 600000 | Connection
active timeout.
|
+| spark.rss.client.netty.client.threads | 0 | Number of
threads used in the client thread pool. Default is 0, netty will use the number
of (available logical cores * 2) as the number of threads.
|
+| spark.rss.client.netty.client.prefer.direct.bufs | true | If true, we
will prefer allocating off-heap byte buffers within Netty.
|
+| spark.rss.client.netty.client.connections.per.peer | 2 | Suppose
there are 100 executors, spark.rss.client.netty.client.connections.per.peer =
2, then each ShuffleServer will establish a total of (100 * 2) connections with
multiple clients.
|
+| spark.rss.client.netty.client.receive.buffer | 0 | Receive
buffer size (SO_RCVBUF). Note: the optimal size for receive buffer and send
buffer should be latency * network_bandwidth. Assuming latency = 1ms,
network_bandwidth = 10Gbps, buffer size should be ~ 1.25MB. Default is 0, the
operating system automatically estimates the receive buffer size based on
default settings. |
+| spark.rss.client.netty.client.send.buffer | 0 | Send buffer
size (SO_SNDBUF).
|
diff --git a/docs/server_guide.md b/docs/server_guide.md
index 8cfac516..d1865c6e 100644
--- a/docs/server_guide.md
+++ b/docs/server_guide.md
@@ -70,6 +70,13 @@ This document will introduce how to deploy Uniffle shuffle
servers.
| rss.rpc.server.port | - | RPC port
for Shuffle server, if set zero, grpc server start on random port.
|
| rss.jetty.http.port | - | Http
port for Shuffle server
|
| rss.server.netty.port | -1 | Netty
port for Shuffle server, if set zero, netty server start on random port.
|
+| rss.server.netty.epoll.enable | false | If
enable epoll model with netty server.
|
+| rss.server.netty.accept.thread | 10 | Accept
thread count in netty.
|
+| rss.server.netty.worker.thread | 100 | Worker
thread count in netty.
|
+| rss.server.netty.connect.backlog | 0 | For
netty server, requested maximum length of the queue of incoming connections.
|
+| rss.server.netty.connect.timeout | 5000 | Timeout
for connection in netty.
|
+| rss.server.netty.receive.buf | 0 | Receive
buffer size (SO_RCVBUF). Note: the optimal size for receive buffer and send
buffer should be latency * network_bandwidth. Assuming latency = 1ms,
network_bandwidth = 10Gbps, buffer size should be ~ 1.25MB. Default is 0, the
operating system automatically estimates the receive buffer size based on
default settings. |
+| rss.server.netty.send.buf | 0 | Send
buffer size (SO_SNDBUF).
|
| rss.server.buffer.capacity | -1 | Max
memory of buffer manager for shuffle server. If negative, JVM heap size *
buffer.ratio is used
|
| rss.server.buffer.capacity.ratio | 0.8 | when
`rss.server.buffer.capacity`=-1, then the buffer capacity is JVM heap size *
ratio
|
| rss.server.memory.shuffle.highWaterMark.percentage | 75.0 |
Threshold of spill data to storage, percentage of rss.server.buffer.capacity
|
@@ -122,6 +129,22 @@ For HADOOP FS, the conf value of
`rss.server.single.buffer.flush.threshold` shou
Finally, to improve the speed of writing to HDFS for a single partition, the
value of `rss.server.max.concurrency.of.per-partition.write` and
`rss.server.flush.hdfs.threadPool.size` could be increased to 50 or 100.
+### Netty
+In version 0.0.8, we introduced Netty. Enabling netty on ShuffleServer can
significantly reduce GC time in high-throughput scenarios. We can enable netty
through the parameter `rss.server.netty.port`. Note: After enabling netty, the
ShuffleServer The node will be automatically tagged with `grpc_netty`, that is,
the node can only be assigned to clients of `spark.rss.client.type=GRPC_NETTY`.
+
+When enabling Netty, we should also consider memory related configuration, the
following is an example.
+
+#### rss-env.sh
+```
+XMX_SIZE=80g
+MAX_DIRECT_MEMORY_SIZE=60g
+```
+#### server.conf
+```
+rss.server.buffer.capacity 40g
+rss.server.read.buffer.capacity 20g
+```
+
#### Example of server conf
```
rss.rpc.server.port 19999