This is an automated email from the ASF dual-hosted git repository.
zhouky pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-celeborn.git
The following commit(s) were added to refs/heads/main by this push:
new b8cdf36b4 [CELEBORN-831][DOC] Add traffic control document
b8cdf36b4 is described below
commit b8cdf36b40f77f1cd697e60024e6fdb8b432b9e5
Author: zky.zhoukeyong <[email protected]>
AuthorDate: Mon Jul 24 19:51:02 2023 +0800
[CELEBORN-831][DOC] Add traffic control document
### What changes were proposed in this pull request?
As title.
### Why are the changes needed?
As title.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes #1754 from waitinfuture/831.
Authored-by: zky.zhoukeyong <[email protected]>
Signed-off-by: zky.zhoukeyong <[email protected]>
---
docs/assets/img/backpressure.svg | 4 ++
docs/developers/trafficcontrol.md | 80 +++++++++++++++++++++++++++++++++++++++
docs/developers/worker.md | 3 +-
mkdocs.yml | 1 +
4 files changed, 87 insertions(+), 1 deletion(-)
diff --git a/docs/assets/img/backpressure.svg b/docs/assets/img/backpressure.svg
new file mode 100644
index 000000000..67596a081
--- /dev/null
+++ b/docs/assets/img/backpressure.svg
@@ -0,0 +1,4 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!-- Do not edit this file with editors other than draw.io -->
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<svg xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="448px"
height="556px" viewBox="-0.5 -0.5 448 556" content="<mxfile
host="app.diagrams.net" modified="2023-07-24T09:54:28.825Z"
agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
etag="muQwWoZ1AV76Lv06bOp7" version="21.6.5"
type="device"> < [...]
\ No newline at end of file
diff --git a/docs/developers/trafficcontrol.md
b/docs/developers/trafficcontrol.md
new file mode 100644
index 000000000..d824fee5a
--- /dev/null
+++ b/docs/developers/trafficcontrol.md
@@ -0,0 +1,80 @@
+---
+license: |
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+ http://www.apache.org/licenses/LICENSE-2.0
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+---
+
+# Traffic Control
+This article describes the detailed design of Celeborn `Worker`'s traffic
control.
+
+## Design Goal
+The design goal of Traffic Control is to prevent `Worker` OOM without harming
performance. At the
+same time, Celeborn tries to achieve fairness without harming performance.
+
+Celeborn reaches the goal through `Back Pressure` and `Congestion Control`.
+
+## Data Flow
+From the `Worker`'s perspective, the income data flow comes from two sources:
+
+- `ShuffleClient` that pushes primary data to the primary `Worker`
+- Primary `Worker` that sends data replication to the replica `Worker`
+
+The buffered memory can be released when the following conditions are
satisfied:
+
+- Data is flushed to file
+- If replication is on, after primary data is written to wire
+
+The basic idea is that, when `Worker` is under high memory pressure, slow down
or stop income data, and at same
+time force flush to release memory.
+
+## Back Pressure
+`Back Pressure` defines three watermarks:
+
+- `Pause Receive` watermark (defaults to 0.85). If used direct memory ratio
exceeds this, `Worker` will pause
+ receiving data from `ShuffleClient`, and force flush buffered data into file.
+- `Pause Replicate` watermark (defaults to 0.95). If used direct memory ratio
exceeds this, `Worker` will pause
+ receiving both data from `ShuffleClient` and replica data from primary
`Worker`, and force flush buffered
+ data into file.
+- `Resume` watermark (defaults to 0.5). When either `Pause Receive` or `Pause
Replicate` is triggered, to resume
+ receiving data from `ShuffleClient`, the used direct memory ratio should
decrease under this watermark.
+
+`Worker` high-frequently checks used direct memory ratio, and triggers `Pause
Receive`, `Pause Replicate` and `Resume`
+accordingly. The state machine is as follows:
+
+
+
+`Back Pressure` is the basic traffic control and can't be disabled. Users can
tune the three watermarks through the
+following configuration.
+
+- `celeborn.worker.directMemoryRatio*`
+
+## Congestion Control
+`Congestion Control` is an optional mechanism for traffic control, the purpose
is to slow down the push rate
+from `ShuffleClient` when memory is under pressure, and suppress those who
occupied the most resources in the
+last time window. It defines two watermarks:
+
+- `Low Watermark`, under which everything goes OK
+- `High Watermark`, when exceeds this, top users will be Congestion Controlled
+
+Celeborn uses `UserIdentifier` to identify users. `Worker` collects bytes
pushed from each user in the last time
+window. When used direct memory exceeds `High Watermark`, users who occupied
more resources than the average
+occupation will receive `Congestion Control` message.
+
+`ShuffleClient` controls the push ratio in a fashion that is very like `TCP
Congestion Control`. Initially, it's in
+`Slow Start` phase, with a low push rate but increases very fast. When
threshold is reached, it transfers to
+`Congestion Avoidance` phase, which slowly increases push rate. Upon receiving
`Congestion Control`, it goes back
+to `Slow Start` phase.
+
+`Congestion Control` can be enabled and tuned by the following configurations:
+
+- `celeborn.worker.congestionControl.*`
\ No newline at end of file
diff --git a/docs/developers/worker.md b/docs/developers/worker.md
index 05028069d..154e09205 100644
--- a/docs/developers/worker.md
+++ b/docs/developers/worker.md
@@ -18,7 +18,8 @@ license: |
The main functions of Celeborn `Worker` are:
- Store, serve, and manage `PartitionLocation` data. See
[Storage](../../developers/storage)
-- Traffic control through `Back Pressure` and `Congestion Control`
+- Traffic control through `Back Pressure` and `Congestion Control`. See
+ [Traffic Control](../../developers/trafficcontrol)
- Support rolling upgrade through `Graceful Shutdown`
- Support elasticity through `Decommission Shutdown`
- Self health check
diff --git a/mkdocs.yml b/mkdocs.yml
index f684d708a..da132deef 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -87,6 +87,7 @@ nav:
- Worker:
- Overview: developers/worker.md
- Storage: developers/storage.md
+ - Traffic Control: developers/trafficcontrol.md
# - Client: developers/client.md
- PushData: developers/pushdata.md
# - ReadData: developers/readdata.md