This is an automated email from the ASF dual-hosted git repository.
zuston pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-uniffle.git
The following commit(s) were added to refs/heads/master by this push:
new f6dde0a0 [#889] improvement: set default value of
`rss.server.max.concurrency.of.per-partition.write` to `30`. (#1037)
f6dde0a0 is described below
commit f6dde0a08773107a9e7ecf45487c2387ee1e509d
Author: Xianming Lei <[email protected]>
AuthorDate: Wed Jul 26 16:42:20 2023 +0800
[#889] improvement: set default value of
`rss.server.max.concurrency.of.per-partition.write` to `30`. (#1037)
### What changes were proposed in this pull request?
1. Modify default value of
`rss.server.max.concurrency.of.per-partition.write` to `30`.
2. And set the conf of `rss.server.flush.hadoop.threadPool.size` to `60`
### Why are the changes needed?
For 0.8.0 Release.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
exiting UTs.
---------
Co-authored-by: leixianming <[email protected]>
---
README.md | 2 +-
conf/server.conf | 2 +-
deploy/kubernetes/operator/examples/configuration.yaml | 2 +-
docs/server_guide.md | 12 ++++++------
.../java/org/apache/uniffle/server/ShuffleServerConf.java | 4 ++--
.../org/apache/uniffle/server/ShuffleFlushManagerTest.java | 1 +
6 files changed, 12 insertions(+), 11 deletions(-)
diff --git a/README.md b/README.md
index 50126466..da7b7b53 100644
--- a/README.md
+++ b/README.md
@@ -198,7 +198,7 @@ rss-xxx.tgz will be generated for deployment
# it's better to config thread num according to local disk num
rss.server.flush.thread.alive 5
rss.server.flush.localfile.threadPool.size 10
- rss.server.flush.hadoop.threadPool.size 10
+ rss.server.flush.hadoop.threadPool.size 60
rss.server.buffer.capacity 40g
rss.server.read.buffer.capacity 20g
rss.server.heartbeat.interval 10000
diff --git a/conf/server.conf b/conf/server.conf
index c30cacc3..e118c403 100644
--- a/conf/server.conf
+++ b/conf/server.conf
@@ -24,7 +24,7 @@ rss.server.buffer.capacity 40gb
rss.server.read.buffer.capacity 20gb
rss.server.flush.thread.alive 5
rss.server.flush.localfile.threadPool.size 10
-rss.server.flush.hadoop.threadPool.size 10
+rss.server.flush.hadoop.threadPool.size 60
rss.server.disk.capacity 1t
rss.server.single.buffer.flush.enabled true
rss.server.single.buffer.flush.threshold 128m
diff --git a/deploy/kubernetes/operator/examples/configuration.yaml
b/deploy/kubernetes/operator/examples/configuration.yaml
index 972e5019..7c8ba39c 100644
--- a/deploy/kubernetes/operator/examples/configuration.yaml
+++ b/deploy/kubernetes/operator/examples/configuration.yaml
@@ -59,7 +59,7 @@ data:
rss.server.flush.cold.storage.threshold.size 128m
rss.server.flush.thread.alive 6
rss.server.flush.localfile.threadPool.size 12
- rss.server.flush.hadoop.threadPool.size 12
+ rss.server.flush.hadoop.threadPool.size 60
rss.server.hadoop.dfs.client.socket-timeout 15000
rss.server.hadoop.dfs.replication 2
rss.server.hdfs.base.path hdfs://${your-hdfs-path}
diff --git a/docs/server_guide.md b/docs/server_guide.md
index e796b3b2..8cfac516 100644
--- a/docs/server_guide.md
+++ b/docs/server_guide.md
@@ -45,7 +45,7 @@ This document will introduce how to deploy Uniffle shuffle
servers.
# it's better to config thread num according to local disk num
rss.server.flush.thread.alive 5
rss.server.flush.localfile.threadPool.size 10
- rss.server.flush.hadoop.threadPool.size 10
+ rss.server.flush.hadoop.threadPool.size 60
rss.server.buffer.capacity 40g
rss.server.read.buffer.capacity 20g
rss.server.heartbeat.interval 10000
@@ -78,7 +78,7 @@ This document will introduce how to deploy Uniffle shuffle
servers.
| rss.server.read.buffer.capacity.ratio | 0.4 | when
`rss.server.read.buffer.capacity`=-1, then read buffer capacity is JVM heap
size * ratio
|
| rss.server.heartbeat.interval | 10000 |
Heartbeat interval to Coordinator (ms)
|
| rss.server.flush.localfile.threadPool.size | 10 | Thread
pool for flush data to local file
|
-| rss.server.flush.hadoop.threadPool.size | 10 | Thread
pool for flush data to hadoop storage
|
+| rss.server.flush.hadoop.threadPool.size | 60 | Thread
pool for flush data to hadoop storage
|
| rss.server.commit.timeout | 600000 | Timeout
when commit shuffle data (ms)
|
| rss.storage.type | - | Supports
MEMORY_LOCALFILE, MEMORY_HDFS, MEMORY_LOCALFILE_HDFS
|
| rss.server.flush.cold.storage.threshold.size | 64M | The
threshold of data size for LOACALFILE and HADOOP if MEMORY_LOCALFILE_HDFS is
used
|
@@ -89,7 +89,7 @@ This document will introduce how to deploy Uniffle shuffle
servers.
| rss.server.disk.capacity.ratio | 0.9 | When
`rss.server.disk.capacity` is negative, disk whole space * ratio is used
|
| rss.server.multistorage.fallback.strategy.class | - | The
fallback strategy for `MEMORY_LOCALFILE_HDFS`. Support
`org.apache.uniffle.server.storage.RotateStorageManagerFallbackStrategy`,`org.apache.uniffle.server.storage.LocalStorageManagerFallbackStrategy`
and `org.apache.uniffle.server.storage.HadoopStorageManagerFallbackStrategy`.
If not set,
`org.apache.uniffle.server.storage.HadoopStorageManagerFallbackStrategy` will
be used. |
| rss.server.leak.shuffledata.check.interval | 3600000 | The
interval of leak shuffle data check (ms)
|
-| rss.server.max.concurrency.of.per-partition.write | 1 | The max
concurrency of single partition writer, the data partition file number is equal
to this value. Default value is 1. This config could improve the writing speed,
especially for huge partition.
|
+| rss.server.max.concurrency.of.per-partition.write | 30 | The max
concurrency of single partition writer, the data partition file number is equal
to this value. Default value is 1. This config could improve the writing speed,
especially for huge partition.
|
| rss.server.max.concurrency.limit.of.per-partition.write | - | The limit for
max concurrency per-partition write specified by client, this won't be enabled
by default.
|
| rss.metrics.reporter.class | - | The
class of metrics reporter.
|
| rss.server.multistorage.manager.selector.class |
org.apache.uniffle.server.storage.multi.DefaultStorageManagerSelector | The
manager selector strategy for `MEMORY_LOCALFILE_HDFS`. Default value is
`DefaultStorageManagerSelector`, and another
`HugePartitionSensitiveStorageManagerSelector` will flush only huge partition's
data to cold storage.
[...]
@@ -120,7 +120,7 @@ If you don't use HADOOP FS, the huge partition may be
flushed to local disk, whi
For HADOOP FS, the conf value of `rss.server.single.buffer.flush.threshold`
should be greater than the value of
`rss.server.flush.cold.storage.threshold.size`, which will flush data directly
to Hadoop FS.
-Finally, to improve the speed of writing to HDFS for a single partition, the
value of `rss.server.max.concurrency.of.per-partition.write` and
`rss.server.flush.hdfs.threadPool.size` could be increased to 10 or 20.
+Finally, to improve the speed of writing to HDFS for a single partition, the
value of `rss.server.max.concurrency.of.per-partition.write` and
`rss.server.flush.hdfs.threadPool.size` could be increased to 50 or 100.
#### Example of server conf
```
@@ -141,10 +141,10 @@ rss.server.app.expired.withoutHeartbeat 120000
# For huge partitions
rss.server.flush.localfile.threadPool.size 20
-rss.server.flush.hadoop.threadPool.size 20
+rss.server.flush.hadoop.threadPool.size 60
rss.server.flush.cold.storage.threshold.size 128m
rss.server.single.buffer.flush.threshold 129m
-rss.server.max.concurrency.of.per-partition.write 20
+rss.server.max.concurrency.of.per-partition.write 30
rss.server.huge-partition.size.threshold 20g
rss.server.huge-partition.memory.limit.ratio 0.2
```
diff --git
a/server/src/main/java/org/apache/uniffle/server/ShuffleServerConf.java
b/server/src/main/java/org/apache/uniffle/server/ShuffleServerConf.java
index 4e816183..095946b8 100644
--- a/server/src/main/java/org/apache/uniffle/server/ShuffleServerConf.java
+++ b/server/src/main/java/org/apache/uniffle/server/ShuffleServerConf.java
@@ -86,7 +86,7 @@ public class ShuffleServerConf extends RssBaseConf {
public static final ConfigOption<Integer>
SERVER_FLUSH_HADOOP_THREAD_POOL_SIZE =
ConfigOptions.key("rss.server.flush.hadoop.threadPool.size")
.intType()
- .defaultValue(10)
+ .defaultValue(60)
.withDescription("thread pool for flush data to hadoop storage");
public static final ConfigOption<Integer>
SERVER_FLUSH_THREAD_POOL_QUEUE_SIZE =
@@ -364,7 +364,7 @@ public class ShuffleServerConf extends RssBaseConf {
public static final ConfigOption<Integer>
SERVER_MAX_CONCURRENCY_OF_ONE_PARTITION =
ConfigOptions.key("rss.server.max.concurrency.of.per-partition.write")
.intType()
- .defaultValue(1)
+ .defaultValue(30)
.withDescription(
"The max concurrency of single partition writer, the data
partition file number is "
+ "equal to this value. Default value is 1.")
diff --git
a/server/src/test/java/org/apache/uniffle/server/ShuffleFlushManagerTest.java
b/server/src/test/java/org/apache/uniffle/server/ShuffleFlushManagerTest.java
index 2e8b3cff..0ebb0d62 100644
---
a/server/src/test/java/org/apache/uniffle/server/ShuffleFlushManagerTest.java
+++
b/server/src/test/java/org/apache/uniffle/server/ShuffleFlushManagerTest.java
@@ -102,6 +102,7 @@ public class ShuffleFlushManagerTest extends HadoopTestBase
{
ShuffleServerMetrics.register();
shuffleServerConf.set(ShuffleServerConf.RSS_STORAGE_BASE_PATH,
Collections.emptyList());
shuffleServerConf.setString(ShuffleServerConf.RSS_STORAGE_TYPE,
StorageType.HDFS.name());
+
shuffleServerConf.setInteger(ShuffleServerConf.SERVER_MAX_CONCURRENCY_OF_ONE_PARTITION,
1);
LogManager.getRootLogger().setLevel(Level.INFO);
}