This is an automated email from the ASF dual-hosted git repository.
chengpan pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-celeborn.git
The following commit(s) were added to refs/heads/main by this push:
new 218bfc78a [CELEBORN-629][DOC] Add doc about enable rac-awareness
218bfc78a is described below
commit 218bfc78a561f83f95ab6198a34d10104ba51002
Author: Angerszhuuuu <[email protected]>
AuthorDate: Mon Jun 5 10:28:26 2023 +0800
[CELEBORN-629][DOC] Add doc about enable rac-awareness
### What changes were proposed in this pull request?
Add doc about enabling rac-awareness
### Why are the changes needed?
Document new features.
### Does this PR introduce _any_ user-facing change?
Yes, the docs are updated.
### How was this patch tested?
<img width="1085" alt="截屏2023-06-02 下午3 19 10"
src="https://github.com/apache/incubator-celeborn/assets/46485123/c8c51a4c-40be-40ea-befd-3c369b9f7600">
Closes #1536 from AngersZhuuuu/CELEBORN-629.
Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Cheng Pan <[email protected]>
---
docs/configuration/index.md | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/docs/configuration/index.md b/docs/configuration/index.md
index 40d43a070..175532c16 100644
--- a/docs/configuration/index.md
+++ b/docs/configuration/index.md
@@ -151,6 +151,35 @@ If you have 8192 mapper tasks, you could set
`spark.celeborn.push.maxReqsInFligh
If `celeborn.worker.flush.buffer.size` is 256 KB, we can have total slots up
to 327680 slots.
+## Rack Awareness
+
+Celeborn can be rack-aware by setting
`celeborn.client.reserveSlots.rackware.enabled` to `true` on client side.
+Shuffle partition block replica placement will use rack awareness for fault
tolerance by placing one shuffle partition replica
+on a different rack. This provides data availability in the event of a network
switch failure or partition within the cluster.
+
+Celeborn master daemons obtain the rack id of the cluster workers by invoking
either an external script or Java class as specified by configuration files.
+Using either the Java class or external script for topology, output must
adhere to the java `org.apache.hadoop.net.DNSToSwitchMapping` interface.
+The interface expects a one-to-one correspondence to be maintained and the
topology information in the format of `/myrack/myhost`,
+where `/` is the topology delimiter, `myrack` is the rack identifier, and
`myhost` is the individual host.
+Assuming a single `/24` subnet per rack, one could use the format of
`/192.168.100.0/192.168.100.5` as a unique rack-host topology mapping.
+
+To use the Java class for topology mapping, the class name is specified by the
`celeborn.hadoop.net.topology.node.switch.mapping.impl` parameter in the master
configuration file.
+An example, `NetworkTopology.java`, is included with the Celeborn distribution
and can be customized by the Celeborn administrator.
+Using a Java class instead of an external script has a performance benefit in
that Celeborn doesn't need to fork an external process when a new worker node
registers itself.
+
+If implementing an external script, it will be specified with the
`celeborn.hadoop.net.topology.script.file.name` parameter in the master side
configuration files.
+Unlike the Java class, the external topology script is not included with the
Celeborn distribution and is provided by the administrator.
+Celeborn will send multiple IP addresses to ARGV when forking the topology
script. The number of IP addresses sent to the topology script
+is controlled with `celeborn.hadoop.net.topology.script.number.args` and
defaults to 100.
+If `celeborn.hadoop.net.topology.script.number.args` was changed to 1, a
topology script would get forked for each IP submitted by workers.
+
+If `celeborn.hadoop.net.topology.script.file.name` or
`celeborn.hadoop.net.topology.node.switch.mapping.impl` is not set, the rack id
`/default-rack` is returned for any passed IP address.
+While this behavior appears desirable, it can cause issues with shuffle
partition block replication as default behavior
+is to write one replicated block off rack and is unable to do so as there is
only a single rack named `/default-rack`.
+
+Example can refer to [Hadoop Rack
Awareness](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/RackAwareness.html)
since Celeborn use hadoop's code about rack-aware.
+
+
## Worker Recover Status After Restart
`ShuffleClient` records the shuffle partition location's host, service port,
and filename,