[incubator-uniffle] branch master updated: [#889] improvement: set default value of `rss.server.max.concurrency.of.per-partition.write` to `30`. (#1037)

zuston Wed, 26 Jul 2023 01:42:55 -0700

This is an automated email from the ASF dual-hosted git repository.

zuston pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-uniffle.git



The following commit(s) were added to refs/heads/master by this push:
     new f6dde0a0 [#889] improvement: set default value of 
`rss.server.max.concurrency.of.per-partition.write`  to `30`. (#1037)
f6dde0a0 is described below

commit f6dde0a08773107a9e7ecf45487c2387ee1e509d
Author: Xianming Lei <[email protected]>
AuthorDate: Wed Jul 26 16:42:20 2023 +0800

    [#889] improvement: set default value of 
`rss.server.max.concurrency.of.per-partition.write`  to `30`. (#1037)
    
    ### What changes were proposed in this pull request?
    
    1. Modify default value of 
`rss.server.max.concurrency.of.per-partition.write`  to `30`.
    2. And set the conf of `rss.server.flush.hadoop.threadPool.size` to `60`
    
    ### Why are the changes needed?
    
    For 0.8.0 Release.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    exiting UTs.
    
    ---------
    
    Co-authored-by: leixianming <[email protected]>
---
 README.md                                                    |  2 +-
 conf/server.conf                                             |  2 +-
 deploy/kubernetes/operator/examples/configuration.yaml       |  2 +-
 docs/server_guide.md                                         | 12 ++++++------
 .../java/org/apache/uniffle/server/ShuffleServerConf.java    |  4 ++--
 .../org/apache/uniffle/server/ShuffleFlushManagerTest.java   |  1 +
 6 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/README.md b/README.md
index 50126466..da7b7b53 100644
--- a/README.md
+++ b/README.md
@@ -198,7 +198,7 @@ rss-xxx.tgz will be generated for deployment
      # it's better to config thread num according to local disk num
      rss.server.flush.thread.alive 5
      rss.server.flush.localfile.threadPool.size 10
-     rss.server.flush.hadoop.threadPool.size 10
+     rss.server.flush.hadoop.threadPool.size 60
      rss.server.buffer.capacity 40g
      rss.server.read.buffer.capacity 20g
      rss.server.heartbeat.interval 10000
diff --git a/conf/server.conf b/conf/server.conf
index c30cacc3..e118c403 100644
--- a/conf/server.conf
+++ b/conf/server.conf
@@ -24,7 +24,7 @@ rss.server.buffer.capacity 40gb
 rss.server.read.buffer.capacity 20gb
 rss.server.flush.thread.alive 5
 rss.server.flush.localfile.threadPool.size 10
-rss.server.flush.hadoop.threadPool.size 10
+rss.server.flush.hadoop.threadPool.size 60
 rss.server.disk.capacity 1t
 rss.server.single.buffer.flush.enabled true
 rss.server.single.buffer.flush.threshold 128m
diff --git a/deploy/kubernetes/operator/examples/configuration.yaml 
b/deploy/kubernetes/operator/examples/configuration.yaml
index 972e5019..7c8ba39c 100644
--- a/deploy/kubernetes/operator/examples/configuration.yaml
+++ b/deploy/kubernetes/operator/examples/configuration.yaml
@@ -59,7 +59,7 @@ data:
     rss.server.flush.cold.storage.threshold.size 128m
     rss.server.flush.thread.alive 6
     rss.server.flush.localfile.threadPool.size 12
-    rss.server.flush.hadoop.threadPool.size 12
+    rss.server.flush.hadoop.threadPool.size 60
     rss.server.hadoop.dfs.client.socket-timeout 15000
     rss.server.hadoop.dfs.replication 2
     rss.server.hdfs.base.path hdfs://${your-hdfs-path}
diff --git a/docs/server_guide.md b/docs/server_guide.md
index e796b3b2..8cfac516 100644
--- a/docs/server_guide.md
+++ b/docs/server_guide.md
@@ -45,7 +45,7 @@ This document will introduce how to deploy Uniffle shuffle 
servers.
      # it's better to config thread num according to local disk num
      rss.server.flush.thread.alive 5
      rss.server.flush.localfile.threadPool.size 10
-     rss.server.flush.hadoop.threadPool.size 10
+     rss.server.flush.hadoop.threadPool.size 60
      rss.server.buffer.capacity 40g
      rss.server.read.buffer.capacity 20g
      rss.server.heartbeat.interval 10000
@@ -78,7 +78,7 @@ This document will introduce how to deploy Uniffle shuffle 
servers.
 | rss.server.read.buffer.capacity.ratio                   | 0.4     | when 
`rss.server.read.buffer.capacity`=-1, then read buffer capacity is JVM heap 
size * ratio                                                                    
                                                                                
                                                                                
                                                            |
 | rss.server.heartbeat.interval                           | 10000   | 
Heartbeat interval to Coordinator (ms)                                          
                                                                                
                                                                                
                                                                                
                                                             |
 | rss.server.flush.localfile.threadPool.size              | 10      | Thread 
pool for flush data to local file                                               
                                                                                
                                                                                
                                                                                
                                                      |
-| rss.server.flush.hadoop.threadPool.size                 | 10      | Thread 
pool for flush data to hadoop storage                                           
                                                                                
                                                                                
                                                                                
                                                      |
+| rss.server.flush.hadoop.threadPool.size                 | 60      | Thread 
pool for flush data to hadoop storage                                           
                                                                                
                                                                                
                                                                                
                                                      |
 | rss.server.commit.timeout                               | 600000  | Timeout 
when commit shuffle data (ms)                                                   
                                                                                
                                                                                
                                                                                
                                                     |
 | rss.storage.type                                        | -       | Supports 
MEMORY_LOCALFILE, MEMORY_HDFS, MEMORY_LOCALFILE_HDFS                            
                                                                                
                                                                                
                                                                                
                                                    |
 | rss.server.flush.cold.storage.threshold.size            | 64M     | The 
threshold of data size for LOACALFILE and HADOOP if MEMORY_LOCALFILE_HDFS is 
used                                                                            
                                                                                
                                                                                
                                                            |
@@ -89,7 +89,7 @@ This document will introduce how to deploy Uniffle shuffle 
servers.
 | rss.server.disk.capacity.ratio                          | 0.9     | When 
`rss.server.disk.capacity` is negative, disk whole space * ratio is used        
                                                                                
                                                                                
                                                                                
                                                        |
 | rss.server.multistorage.fallback.strategy.class         | -       | The 
fallback strategy for `MEMORY_LOCALFILE_HDFS`. Support 
`org.apache.uniffle.server.storage.RotateStorageManagerFallbackStrategy`,`org.apache.uniffle.server.storage.LocalStorageManagerFallbackStrategy`
 and `org.apache.uniffle.server.storage.HadoopStorageManagerFallbackStrategy`. 
If not set, 
`org.apache.uniffle.server.storage.HadoopStorageManagerFallbackStrategy` will 
be used. |
 | rss.server.leak.shuffledata.check.interval              | 3600000 | The 
interval of leak shuffle data check (ms)                                        
                                                                                
                                                                                
                                                                                
                                                         |
-| rss.server.max.concurrency.of.per-partition.write       | 1       | The max 
concurrency of single partition writer, the data partition file number is equal 
to this value. Default value is 1. This config could improve the writing speed, 
especially for huge partition.                                                  
                                                                                
                                                     |
+| rss.server.max.concurrency.of.per-partition.write       | 30      | The max 
concurrency of single partition writer, the data partition file number is equal 
to this value. Default value is 1. This config could improve the writing speed, 
especially for huge partition.                                                  
                                                                                
                                                     |
 | rss.server.max.concurrency.limit.of.per-partition.write | - | The limit for 
max concurrency per-partition write specified by client, this won't be enabled 
by default.                                                                     
                                                                                
                                                                                
                                                |
 | rss.metrics.reporter.class                              | -       | The 
class of metrics reporter.                                                      
                                                                                
                                                                                
                                                                                
                                                         |
 | rss.server.multistorage.manager.selector.class          | 
org.apache.uniffle.server.storage.multi.DefaultStorageManagerSelector | The 
manager selector strategy for `MEMORY_LOCALFILE_HDFS`. Default value is 
`DefaultStorageManagerSelector`, and another 
`HugePartitionSensitiveStorageManagerSelector` will flush only huge partition's 
data to cold storage.                                                           
                                                                                
[...]
@@ -120,7 +120,7 @@ If you don't use HADOOP FS, the huge partition may be 
flushed to local disk, whi
 
 For HADOOP FS, the conf value of `rss.server.single.buffer.flush.threshold` 
should be greater than the value of 
`rss.server.flush.cold.storage.threshold.size`, which will flush data directly 
to Hadoop FS. 
 
-Finally, to improve the speed of writing to HDFS for a single partition, the 
value of `rss.server.max.concurrency.of.per-partition.write` and 
`rss.server.flush.hdfs.threadPool.size` could be increased to 10 or 20.
+Finally, to improve the speed of writing to HDFS for a single partition, the 
value of `rss.server.max.concurrency.of.per-partition.write` and 
`rss.server.flush.hdfs.threadPool.size` could be increased to 50 or 100.
 
 #### Example of server conf
 ```
@@ -141,10 +141,10 @@ rss.server.app.expired.withoutHeartbeat 120000
 
 # For huge partitions
 rss.server.flush.localfile.threadPool.size 20
-rss.server.flush.hadoop.threadPool.size 20
+rss.server.flush.hadoop.threadPool.size 60
 rss.server.flush.cold.storage.threshold.size 128m
 rss.server.single.buffer.flush.threshold 129m
-rss.server.max.concurrency.of.per-partition.write 20
+rss.server.max.concurrency.of.per-partition.write 30
 rss.server.huge-partition.size.threshold 20g
 rss.server.huge-partition.memory.limit.ratio 0.2
 ```
diff --git 
a/server/src/main/java/org/apache/uniffle/server/ShuffleServerConf.java 
b/server/src/main/java/org/apache/uniffle/server/ShuffleServerConf.java
index 4e816183..095946b8 100644
--- a/server/src/main/java/org/apache/uniffle/server/ShuffleServerConf.java
+++ b/server/src/main/java/org/apache/uniffle/server/ShuffleServerConf.java
@@ -86,7 +86,7 @@ public class ShuffleServerConf extends RssBaseConf {
   public static final ConfigOption<Integer> 
SERVER_FLUSH_HADOOP_THREAD_POOL_SIZE =
       ConfigOptions.key("rss.server.flush.hadoop.threadPool.size")
           .intType()
-          .defaultValue(10)
+          .defaultValue(60)
           .withDescription("thread pool for flush data to hadoop storage");
 
   public static final ConfigOption<Integer> 
SERVER_FLUSH_THREAD_POOL_QUEUE_SIZE =
@@ -364,7 +364,7 @@ public class ShuffleServerConf extends RssBaseConf {
   public static final ConfigOption<Integer> 
SERVER_MAX_CONCURRENCY_OF_ONE_PARTITION =
       ConfigOptions.key("rss.server.max.concurrency.of.per-partition.write")
           .intType()
-          .defaultValue(1)
+          .defaultValue(30)
           .withDescription(
               "The max concurrency of single partition writer, the data 
partition file number is "
                   + "equal to this value. Default value is 1.")
diff --git 
a/server/src/test/java/org/apache/uniffle/server/ShuffleFlushManagerTest.java 
b/server/src/test/java/org/apache/uniffle/server/ShuffleFlushManagerTest.java
index 2e8b3cff..0ebb0d62 100644
--- 
a/server/src/test/java/org/apache/uniffle/server/ShuffleFlushManagerTest.java
+++ 
b/server/src/test/java/org/apache/uniffle/server/ShuffleFlushManagerTest.java
@@ -102,6 +102,7 @@ public class ShuffleFlushManagerTest extends HadoopTestBase 
{
     ShuffleServerMetrics.register();
     shuffleServerConf.set(ShuffleServerConf.RSS_STORAGE_BASE_PATH, 
Collections.emptyList());
     shuffleServerConf.setString(ShuffleServerConf.RSS_STORAGE_TYPE, 
StorageType.HDFS.name());
+    
shuffleServerConf.setInteger(ShuffleServerConf.SERVER_MAX_CONCURRENCY_OF_ONE_PARTITION,
 1);
     LogManager.getRootLogger().setLevel(Level.INFO);
   }

[incubator-uniffle] branch master updated: [#889] improvement: set default value of `rss.server.max.concurrency.of.per-partition.write` to `30`. (#1037)

Reply via email to