This is an automated email from the ASF dual-hosted git repository.
roryqi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-uniffle.git
The following commit(s) were added to refs/heads/master by this push:
new 9f860546 [DOC] Migrate the coordinator doc from README to docs page
(#153)
9f860546 is described below
commit 9f860546b4aecb6fa175d72e85464e29c722cb13
Author: Junfan Zhang <[email protected]>
AuthorDate: Thu Aug 11 11:48:43 2022 +0800
[DOC] Migrate the coordinator doc from README to docs page (#153)
### What changes were proposed in this pull request?
[DOC] Migrate the coordinator doc from README to docs page
## Why are the changes needed?
The dedicated doc page will benefit users to find configs
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
No need
---
README.md | 12 +---
.../uniffle/coordinator/CoordinatorConf.java | 2 +-
docs/coordinator_guide.md | 79 ++++++++++++++++++++++
3 files changed, 81 insertions(+), 12 deletions(-)
diff --git a/README.md b/README.md
index 24d8675a..1731a16a 100644
--- a/README.md
+++ b/README.md
@@ -226,17 +226,7 @@ The important configuration is listed as following.
### Coordinator
-|Property Name|Default| Description|
-|---|---|---|
-|rss.coordinator.server.heartbeat.timeout|30000|Timeout if can't get heartbeat
from shuffle server|
-|rss.coordinator.assignment.strategy|PARTITION_BALANCE|Strategy for assigning
shuffle server, PARTITION_BALANCE should be used for workload balance|
-|rss.coordinator.app.expired|60000|Application expired time (ms), the
heartbeat interval should be less than it|
-|rss.coordinator.shuffle.nodes.max|9|The max number of shuffle server when do
the assignment|
-|rss.coordinator.dynamicClientConf.path|-|The path of configuration file which
have default conf for rss client|
-|rss.coordinator.exclude.nodes.file.path|-|The path of configuration file
which have exclude nodes|
-|rss.coordinator.exclude.nodes.check.interval.ms|60000|Update interval (ms)
for exclude nodes|
-|rss.rpc.server.port|-|RPC port for coordinator|
-|rss.jetty.http.port|-|Http port for coordinator|
+For more details of advanced configuration, please see [Uniffle Coordinator
Guide](https://github.com/apache/incubator-uniffle/blob/master/docs/coordinator_guide.md).
### Shuffle Server
diff --git
a/coordinator/src/main/java/org/apache/uniffle/coordinator/CoordinatorConf.java
b/coordinator/src/main/java/org/apache/uniffle/coordinator/CoordinatorConf.java
index 18bd6614..765f9702 100644
---
a/coordinator/src/main/java/org/apache/uniffle/coordinator/CoordinatorConf.java
+++
b/coordinator/src/main/java/org/apache/uniffle/coordinator/CoordinatorConf.java
@@ -119,7 +119,7 @@ public class CoordinatorConf extends RssBaseConf {
.intType()
.checkValue(ConfigUtils.POSITIVE_INTEGER_VALIDATOR_2, "dynamic client
conf update interval in seconds")
.defaultValue(120)
- .withDescription("Accessed candidates update interval in seconds");
+ .withDescription("The dynamic client conf update interval in seconds");
public static final ConfigOption<String>
COORDINATOR_REMOTE_STORAGE_CLUSTER_CONF = ConfigOptions
.key("rss.coordinator.remote.storage.cluster.conf")
.stringType()
diff --git a/docs/coordinator_guide.md b/docs/coordinator_guide.md
index 274c875e..6764b529 100644
--- a/docs/coordinator_guide.md
+++ b/docs/coordinator_guide.md
@@ -21,3 +21,82 @@ license: |
---
# Uniffle Coordinator Guide
+
+Uniffle is a unified remote shuffle service for compute engines, the role of
coordinator is responsibility for
+collecting status of shuffle server and doing the assignment for the job.
+
+## Deploy
+This document will introduce how to deploy Uniffle coordinators.
+
+### Steps
+1. unzip package to RSS_HOME
+2. update RSS_HOME/bin/rss-env.sh, eg,
+ ```
+ JAVA_HOME=<java_home>
+ HADOOP_HOME=<hadoop home>
+ XMX_SIZE="16g"
+ ```
+3. update RSS_HOME/conf/coordinator.conf, eg,
+ ```
+ rss.rpc.server.port 19999
+ rss.jetty.http.port 19998
+ rss.coordinator.server.heartbeat.timeout 30000
+ rss.coordinator.app.expired 60000
+ rss.coordinator.shuffle.nodes.max 5
+ # enable dynamicClientConf, and coordinator will be responsible for most
of client conf
+ rss.coordinator.dynamicClientConf.enabled true
+ # config the path of client conf
+ rss.coordinator.dynamicClientConf.path <RSS_HOME>/conf/dynamic_client.conf
+ # config the path of excluded shuffle server
+ rss.coordinator.exclude.nodes.file.path <RSS_HOME>/conf/exclude_nodes
+ ```
+4. update <RSS_HOME>/conf/dynamic_client.conf, rss client will get default
conf from coordinator eg,
+ ```
+ # MEMORY_LOCALFILE_HDFS is recommandation for production environment
+ rss.storage.type MEMORY_LOCALFILE_HDFS
+ # multiple remote storages are supported, and client will get assignment
from coordinator
+ rss.coordinator.remote.storage.path
hdfs://cluster1/path,hdfs://cluster2/path
+ rss.writer.require.memory.retryMax 1200
+ rss.client.retry.max 100
+ rss.writer.send.check.timeout 600000
+ rss.client.read.buffer.size 14m
+ ```
+5. start Coordinator
+ ```
+ bash RSS_HOME/bin/start-coordnator.sh
+ ```
+
+## Configuration
+
+### Common settings
+|Property Name|Default| Description|
+|---|---|---|
+|rss.coordinator.server.heartbeat.timeout|30000|Timeout if can't get heartbeat
from shuffle server|
+|rss.coordinator.server.periodic.output.interval.times|30|The periodic
interval times of output alive nodes. The interval sec can be calculated by
(rss.coordinator.server.heartbeat.timeout/3 *
rss.coordinator.server.periodic.output.interval.times). Default output interval
is 5min.|
+|rss.coordinator.assignment.strategy|PARTITION_BALANCE|Strategy for assigning
shuffle server, PARTITION_BALANCE should be used for workload balance|
+|rss.coordinator.app.expired|60000|Application expired time (ms), the
heartbeat interval should be less than it|
+|rss.coordinator.shuffle.nodes.max|9|The max number of shuffle server when do
the assignment|
+|rss.coordinator.dynamicClientConf.path|-|The path of configuration file which
have default conf for rss client|
+|rss.coordinator.exclude.nodes.file.path|-|The path of configuration file
which have exclude nodes|
+|rss.coordinator.exclude.nodes.check.interval.ms|60000|Update interval (ms)
for exclude nodes|
+|rss.coordinator.access.checkers|org.apache.uniffle.coordinator.AccessClusterLoadChecker|The
access checkers will be used when the spark client use the
DelegationShuffleManager, which will decide whether to use rss according to the
result of the specified access checkers|
+|rss.coordinator.access.loadChecker.memory.percentage|15.0|The minimal
percentage of available memory percentage of a server|
+|rss.coordinator.dynamicClientConf.enabled|false|whether to enable dynamic
client conf, which will be fetched by spark client|
+|rss.coordinator.dynamicClientConf.path|-|The dynamic client conf of this
cluster and can be stored in HDFS or local|
+|rss.coordinator.dynamicClientConf.updateIntervalSec|120|The dynamic client
conf update interval in seconds|
+|rss.coordinator.remote.storage.cluster.conf|-|Remote Storage Cluster related
conf with format $clusterId,$key=$value, separated by ';'|
+|rss.rpc.server.port|-|RPC port for coordinator|
+|rss.jetty.http.port|-|Http port for coordinator|
+
+### AccessClusterLoadChecker settings
+|Property Name|Default| Description|
+|---|---|---|
+|rss.coordinator.access.loadChecker.serverNum.threshold|-|The minimal required
number of healthy shuffle servers when being accessed by client|
+
+### AccessCandidatesChecker settings
+AccessCandidatesChecker is one of the built-in access checker, which will
allow user to define the candidates list to use rss.
+
+|Property Name|Default| Description|
+|---|---|---|
+|rss.coordinator.access.candidates.updateIntervalSec|120|Accessed candidates
update interval in seconds, which is only valid when AccessCandidatesChecker is
enabled.|
+|rss.coordinator.access.candidates.path|-|Accessed candidates file path, the
file can be stored on HDFS|