This is an automated email from the ASF dual-hosted git repository.

zhouky pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-celeborn.git


The following commit(s) were added to refs/heads/main by this push:
     new b8cdf36b4 [CELEBORN-831][DOC] Add traffic control document
b8cdf36b4 is described below

commit b8cdf36b40f77f1cd697e60024e6fdb8b432b9e5
Author: zky.zhoukeyong <[email protected]>
AuthorDate: Mon Jul 24 19:51:02 2023 +0800

    [CELEBORN-831][DOC] Add traffic control document
    
    ### What changes were proposed in this pull request?
    As title.
    
    ### Why are the changes needed?
    As title.
    
    ### Does this PR introduce _any_ user-facing change?
    No.
    
    ### How was this patch tested?
    Manual test.
    
    Closes #1754 from waitinfuture/831.
    
    Authored-by: zky.zhoukeyong <[email protected]>
    Signed-off-by: zky.zhoukeyong <[email protected]>
---
 docs/assets/img/backpressure.svg  |  4 ++
 docs/developers/trafficcontrol.md | 80 +++++++++++++++++++++++++++++++++++++++
 docs/developers/worker.md         |  3 +-
 mkdocs.yml                        |  1 +
 4 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/docs/assets/img/backpressure.svg b/docs/assets/img/backpressure.svg
new file mode 100644
index 000000000..67596a081
--- /dev/null
+++ b/docs/assets/img/backpressure.svg
@@ -0,0 +1,4 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!-- Do not edit this file with editors other than draw.io -->
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" 
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd";>
+<svg xmlns="http://www.w3.org/2000/svg"; 
xmlns:xlink="http://www.w3.org/1999/xlink"; version="1.1" width="448px" 
height="556px" viewBox="-0.5 -0.5 448 556" content="&lt;mxfile 
host=&quot;app.diagrams.net&quot; modified=&quot;2023-07-24T09:54:28.825Z&quot; 
agent=&quot;Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36&quot; 
etag=&quot;muQwWoZ1AV76Lv06bOp7&quot; version=&quot;21.6.5&quot; 
type=&quot;device&quot;&gt;&#10;  &lt [...]
\ No newline at end of file
diff --git a/docs/developers/trafficcontrol.md 
b/docs/developers/trafficcontrol.md
new file mode 100644
index 000000000..d824fee5a
--- /dev/null
+++ b/docs/developers/trafficcontrol.md
@@ -0,0 +1,80 @@
+---
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+  http://www.apache.org/licenses/LICENSE-2.0
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+# Traffic Control
+This article describes the detailed design of Celeborn `Worker`'s traffic 
control.
+
+## Design Goal
+The design goal of Traffic Control is to prevent `Worker` OOM without harming 
performance. At the
+same time, Celeborn tries to achieve fairness without harming performance.
+
+Celeborn reaches the goal through `Back Pressure` and `Congestion Control`.
+
+## Data Flow
+From the `Worker`'s perspective, the income data flow comes from two sources:
+
+- `ShuffleClient` that pushes primary data to the primary `Worker`
+- Primary `Worker` that sends data replication to the replica `Worker`
+
+The buffered memory can be released when the following conditions are 
satisfied:
+
+- Data is flushed to file
+- If replication is on, after primary data is written to wire
+
+The basic idea is that, when `Worker` is under high memory pressure, slow down 
or stop income data, and at same
+time force flush to release memory.
+
+## Back Pressure
+`Back Pressure` defines three watermarks:
+
+- `Pause Receive` watermark (defaults to 0.85). If used direct memory ratio 
exceeds this, `Worker` will pause
+  receiving data from `ShuffleClient`, and force flush buffered data into file.
+- `Pause Replicate` watermark (defaults to 0.95). If used direct memory ratio 
exceeds this, `Worker` will pause
+  receiving both data from `ShuffleClient` and replica data from primary 
`Worker`, and force flush buffered
+  data into file.
+- `Resume` watermark (defaults to 0.5). When either `Pause Receive` or `Pause 
Replicate` is triggered, to resume
+  receiving data from `ShuffleClient`, the used direct memory ratio should 
decrease under this watermark.
+
+`Worker` high-frequently checks used direct memory ratio, and triggers `Pause 
Receive`, `Pause Replicate` and `Resume`
+accordingly. The state machine is as follows:
+
+![backpressure](../../assets/img/backpressure.svg)
+
+`Back Pressure` is the basic traffic control and can't be disabled. Users can 
tune the three watermarks through the
+following configuration.
+
+- `celeborn.worker.directMemoryRatio*`
+
+## Congestion Control
+`Congestion Control` is an optional mechanism for traffic control, the purpose 
is to slow down the push rate
+from `ShuffleClient` when memory is under pressure, and suppress those who 
occupied the most resources in the
+last time window. It defines two watermarks:
+
+- `Low Watermark`, under which everything goes OK
+- `High Watermark`, when exceeds this, top users will be Congestion Controlled
+
+Celeborn uses `UserIdentifier` to identify users. `Worker` collects bytes 
pushed from each user in the last time
+window. When used direct memory exceeds `High Watermark`, users who occupied 
more resources than the average
+occupation will receive `Congestion Control` message.
+
+`ShuffleClient` controls the push ratio in a fashion that is very like `TCP 
Congestion Control`. Initially, it's in
+`Slow Start` phase, with a low push rate but increases very fast. When 
threshold is reached, it transfers to
+`Congestion Avoidance` phase, which slowly increases push rate. Upon receiving 
`Congestion Control`, it goes back
+to `Slow Start` phase.
+
+`Congestion Control` can be enabled and tuned by the following configurations:
+
+- `celeborn.worker.congestionControl.*`
\ No newline at end of file
diff --git a/docs/developers/worker.md b/docs/developers/worker.md
index 05028069d..154e09205 100644
--- a/docs/developers/worker.md
+++ b/docs/developers/worker.md
@@ -18,7 +18,8 @@ license: |
 The main functions of Celeborn `Worker` are:
 
 - Store, serve, and manage `PartitionLocation` data. See 
[Storage](../../developers/storage)
-- Traffic control through `Back Pressure` and `Congestion Control`
+- Traffic control through `Back Pressure` and `Congestion Control`. See
+  [Traffic Control](../../developers/trafficcontrol)
 - Support rolling upgrade through `Graceful Shutdown`
 - Support elasticity through `Decommission Shutdown`
 - Self health check
diff --git a/mkdocs.yml b/mkdocs.yml
index f684d708a..da132deef 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -87,6 +87,7 @@ nav:
       - Worker:
         - Overview: developers/worker.md
         - Storage: developers/storage.md
+        - Traffic Control: developers/trafficcontrol.md
 #      - Client: developers/client.md
       - PushData: developers/pushdata.md
 #      - ReadData: developers/readdata.md

Reply via email to