This is an automated email from the ASF dual-hosted git repository.

chengpan pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-celeborn.git


The following commit(s) were added to refs/heads/main by this push:
     new 218bfc78a [CELEBORN-629][DOC] Add  doc about enable rac-awareness
218bfc78a is described below

commit 218bfc78a561f83f95ab6198a34d10104ba51002
Author: Angerszhuuuu <[email protected]>
AuthorDate: Mon Jun 5 10:28:26 2023 +0800

    [CELEBORN-629][DOC] Add  doc about enable rac-awareness
    
    ### What changes were proposed in this pull request?
    
    Add doc about enabling rac-awareness
    
    ### Why are the changes needed?
    
    Document new features.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, the docs are updated.
    
    ### How was this patch tested?
    
    <img width="1085" alt="截屏2023-06-02 下午3 19 10" 
src="https://github.com/apache/incubator-celeborn/assets/46485123/c8c51a4c-40be-40ea-befd-3c369b9f7600";>
    
    Closes #1536 from AngersZhuuuu/CELEBORN-629.
    
    Authored-by: Angerszhuuuu <[email protected]>
    Signed-off-by: Cheng Pan <[email protected]>
---
 docs/configuration/index.md | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/docs/configuration/index.md b/docs/configuration/index.md
index 40d43a070..175532c16 100644
--- a/docs/configuration/index.md
+++ b/docs/configuration/index.md
@@ -151,6 +151,35 @@ If you have 8192 mapper tasks, you could set 
`spark.celeborn.push.maxReqsInFligh
 
 If `celeborn.worker.flush.buffer.size` is 256 KB, we can have total slots up 
to 327680 slots.
 
+## Rack Awareness
+
+Celeborn can be rack-aware by setting 
`celeborn.client.reserveSlots.rackware.enabled` to `true` on client side.
+Shuffle partition block replica placement will use rack awareness for fault 
tolerance by placing one shuffle partition replica
+on a different rack. This provides data availability in the event of a network 
switch failure or partition within the cluster.
+
+Celeborn master daemons obtain the rack id of the cluster workers by invoking 
either an external script or Java class as specified by configuration files.
+Using either the Java class or external script for topology, output must 
adhere to the java `org.apache.hadoop.net.DNSToSwitchMapping` interface.
+The interface expects a one-to-one correspondence to be maintained and the 
topology information in the format of `/myrack/myhost`,
+where `/` is the topology delimiter, `myrack` is the rack identifier, and 
`myhost` is the individual host.
+Assuming a single `/24` subnet per rack, one could use the format of 
`/192.168.100.0/192.168.100.5` as a unique rack-host topology mapping.
+
+To use the Java class for topology mapping, the class name is specified by the 
`celeborn.hadoop.net.topology.node.switch.mapping.impl` parameter in the master 
configuration file.
+An example, `NetworkTopology.java`, is included with the Celeborn distribution 
and can be customized by the Celeborn administrator. 
+Using a Java class instead of an external script has a performance benefit in 
that Celeborn doesn't need to fork an external process when a new worker node 
registers itself.
+
+If implementing an external script, it will be specified with the 
`celeborn.hadoop.net.topology.script.file.name` parameter in the master side 
configuration files. 
+Unlike the Java class, the external topology script is not included with the 
Celeborn distribution and is provided by the administrator. 
+Celeborn will send multiple IP addresses to ARGV when forking the topology 
script. The number of IP addresses sent to the topology script 
+is controlled with `celeborn.hadoop.net.topology.script.number.args` and 
defaults to 100.
+If `celeborn.hadoop.net.topology.script.number.args` was changed to 1, a 
topology script would get forked for each IP submitted by workers.
+
+If `celeborn.hadoop.net.topology.script.file.name` or 
`celeborn.hadoop.net.topology.node.switch.mapping.impl` is not set, the rack id 
`/default-rack` is returned for any passed IP address.
+While this behavior appears desirable, it can cause issues with shuffle 
partition block replication as default behavior
+is to write one replicated block off rack and is unable to do so as there is 
only a single rack named `/default-rack`.
+
+Example can refer to [Hadoop Rack 
Awareness](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/RackAwareness.html)
 since Celeborn use hadoop's code about rack-aware.
+
+
 ## Worker Recover Status After Restart
 
 `ShuffleClient` records the shuffle partition location's host, service port, 
and filename,

Reply via email to