This is an automated email from the ASF dual-hosted git repository.
dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 192dfb06b55 [doc](cloud rebalance) Add a user doc for cloud reblance
type (#3212)
192dfb06b55 is described below
commit 192dfb06b554dd901c4ff834a45b6ccea0590353
Author: deardeng <[email protected]>
AuthorDate: Wed Dec 31 01:45:47 2025 +0800
[doc](cloud rebalance) Add a user doc for cloud reblance type (#3212)
## Versions
- [x] dev
- [ ] 4.x
- [ ] 3.x
- [ ] 2.1
## Languages
- [ ] Chinese
- [ ] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---
.../managing-compute-cluster.md | 72 ++++++++++++++++++++++
.../managing-compute-cluster.md | 70 +++++++++++++++++++++
2 files changed, 142 insertions(+)
diff --git a/docs/compute-storage-decoupled/managing-compute-cluster.md
b/docs/compute-storage-decoupled/managing-compute-cluster.md
index 94db1fa0158..39f6e17ddd0 100644
--- a/docs/compute-storage-decoupled/managing-compute-cluster.md
+++ b/docs/compute-storage-decoupled/managing-compute-cluster.md
@@ -144,6 +144,78 @@ If the database or compute group name contains reserved
keywords, the correspond
You can scale compute groups by adding or removing BE using `ALTER SYSTEM ADD
BACKEND` and `ALTER SYSTEM DECOMMISION BACKEND`.
+### Load Rebalancing After Scaling
+
+Cloud rebalance is a load balancing operation in Doris's compute-storage
decoupled architecture. It is used to rebalance read and write traffic across
the compute group after scaling (adding or removing) backend nodes in different
Compute Groups. A node that has been offline for an extended period is
considered as removed.
+
+#### Balance Strategy Types
+
+:::caution
+
+The `balance_type` feature is supported starting from Doris 3.1.3 and Doris
4.0.2.
+Prior to these versions, only the FE global configuration
`enable_cloud_warm_up_for_rebalance` was available to control whether warm up
tasks are executed during rebalance.
+
+:::
+
+The following table describes three strategy types, using the example of
adding nodes to a Compute Group:
+
+| Type | Time to Service | Performance Fluctuation | Technical Principle | Use
Cases |
+| :--- | :---: | :---: | :-- | :-- |
+ | `without_warmup` | Fastest | Highest fluctuation | FE directly modifies
shard mapping; first read/write has no file cache and needs to fetch from S3 |
Scenarios requiring quick node deployment with low sensitivity to performance
jitter |
+| `async_warmup` | Faster | Possible cache miss | Issues warm up tasks,
modifies mapping after success or timeout; attempts to pull file cache to new
BE during mapping switch, some scenarios may still miss on first read | General
scenarios with acceptable performance |
+| `sync_warmup` | Slower | Minimal cache miss | Issues warm up tasks, FE
modifies mapping only after task completion, ensuring cache migration |
Scenarios with extremely high performance requirements after scaling, requiring
file cache to exist on new nodes |
+
+#### User Interface
+
+##### Global Default Balance Type
+
+Set the global default value through FE configuration (fe.conf):
+
+```
+cloud_default_rebalance_type = "async_warmup"
+```
+
+##### Compute Group-Level Configuration
+
+You can configure balance type for each Compute Group separately:
+
+```sql
+ALTER COMPUTE GROUP cg1 PROPERTIES("balance_type"="async_warmup");
+```
+
+##### Configuration Rules
+
+1. If a Compute Group does not have `balance_type` configured, it uses the
global default value `async_warmup`.
+2. If a Compute Group has `balance_type` configured, that configuration takes
priority during rebalance.
+
+#### FAQ
+
+##### How to View and Modify Global Rebalance Type?
+
+- **View**: Execute `ADMIN SHOW FRONTEND CONFIG LIKE
"cloud_default_rebalance_type";`
+- **Modify**: Execute `ADMIN SET FRONTEND CONFIG
("cloud_warm_up_for_rebalance_type" = "without_warmup");` (takes effect without
restarting FE)
+
+##### How to Query Compute Group Balance Type?
+
+Execute `SHOW COMPUTE GROUPS;`. The `properties` column in the result contains
Compute Group attribute information, including the `balance_type` configuration.
+
+##### How to Determine if the Cluster is in a Stable Tablet State?
+
+1. **Check via `SHOW BACKENDS`**: Check if the tablet counts across BEs are
close. Reference calculation range:
+ ```
+ (Total tablets in cluster / Compute Group BE count) * 0.95
+ ~
+ (Total tablets in cluster / Compute Group BE count) * 1.05
+ ```
+ The value 0.05 is the default value of the FE configuration
`cloud_rebalance_percent_threshold`. To make tablet distribution more uniform
across BEs in the Compute Group, you can reduce this configuration value.
+
+2. **Observe via FE Metrics**: Check the `doris_fe_cloud_.*_balance_num`
series of metrics in FE metrics. If there is no change for an extended period,
it indicates the Compute Group has reached a balanced state. It is recommended
to configure these metrics on a monitoring dashboard for continuous observation
and judgment.
+ ```bash
+ curl "http://feip:fe_http_port/metrics" | grep '_balance_num'
+ ```
+
+
+
## Renaming Compute Group
You can use the `ALTER SYSTEM RENAME COMPUTE GROUP <old_name> <new_name>`
command to rename an existing compute group. Please refer to the SQL Manual on
[Renaming Compute
Groups](../sql-manual/sql-statements/cluster-management/instance-management/ALTER-SYSTEM-RENAME-COMPUTE-GROUP).
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/managing-compute-cluster.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/managing-compute-cluster.md
index 9bee2822a91..f8322adb352 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/managing-compute-cluster.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/managing-compute-cluster.md
@@ -143,6 +143,76 @@ USE { [catalog_name.]database_name[@compute_group_name] |
@compute_group_name }
通过 `ALTER SYSTEM ADD BACKEND` 以及 `ALTER SYSTEM DECOMMISION BACKEND` 添加或者删除 BE
实现计算组的扩缩容。
+### 计算组扩缩容后负载重均衡
+
+Cloud rebalance 是 Doris 在存算分离架构下,当不同 Compute Group
中的后端节点(Backend)发生扩缩容(长时间节点下线视为缩容)后,用于重新均衡集群读写流量的负载均衡操作。
+
+#### Balance 策略类型
+
+:::caution
+
+`balance_type` 功能自 Doris 3.1.3 和 Doris 4.0.2 版本起支持。
+在此之前,仅支持通过 FE 全局配置 `enable_cloud_warm_up_for_rebalance` 来控制 rebalance 时是否执行
warm up 任务。
+
+:::
+
+以下以向 Compute Group 扩容节点为例,说明三种策略类型:
+
+| 类型 | 新节点可服务时间 | 性能波动 | 技术原理 | 适用场景 |
+| :--- | :---: | :---: | :-- | :-- |
+| `without_warmup` | 最快 | 性能波动最大 | FE 直接修改分片映射;首次读写无 file cache,需从 S3 拉取数据 |
需要新节点快速上线,对性能抖动不敏感的场景 |
+| `async_warmup` | 较快 | 可能出现 cache miss | 下发 warm up 任务,成功或超时后再修改映射;在映射切换时尽力拉取
file cache 到新 BE,部分场景首次读仍可能 miss | 通用场景,性能可接受 |
+| `sync_warmup` | 较慢 | 基本无 cache miss | 下发 warm up 任务,FE 确认任务完成后才修改映射,确保 cache
迁移完成 | 对扩容后性能要求极高,希望新节点上一定存在 file cache 的场景 |
+
+#### 用户接口
+
+##### 全局默认 balance type
+
+通过 FE 配置项(fe.conf)设置全局默认值:
+
+```
+cloud_default_rebalance_type = "async_warmup"
+```
+
+##### Compute Group 维度配置
+
+支持为每个 Compute Group 单独配置 balance type:
+
+```sql
+ALTER COMPUTE GROUP cg1 PROPERTIES("balance_type"="async_warmup");
+```
+
+##### 配置规则
+
+1. 若 Compute Group 未配置 `balance_type`,则使用全局默认值 `async_warmup`。
+2. 若 Compute Group 已配置 `balance_type`,则执行 rebalance 时优先使用该 Compute Group 的配置。
+
+#### FAQ
+
+##### 如何查看与修改全局 rebalance type?
+
+- **查看**:执行 `ADMIN SHOW FRONTEND CONFIG LIKE "cloud_default_rebalance_type";`
+- **修改**:执行 `ADMIN SET FRONTEND CONFIG ("cloud_warm_up_for_rebalance_type" =
"without_warmup");`(修改后无需重启 FE 即可生效)
+
+##### 如何查询 Compute Group 的 balance type?
+
+执行 `SHOW COMPUTE GROUPS;`,结果中的 `properties` 列包含 Compute Group 的属性信息,其中可查看
`balance_type` 配置。
+
+##### 如何判断集群是否处于 tablet 稳定态?
+
+1. **通过 `SHOW BACKENDS` 查看**:检查各 BE 的 tablet 数是否接近。计算方法参考范围:
+ ```
+ (集群所有 tablet 数 / Compute Group BE 数) * 0.95
+ ~
+ (集群所有 tablet 数 / Compute Group BE 数) * 1.05
+ ```
+ 其中 0.05 为 FE 配置项 `cloud_rebalance_percent_threshold` 的默认值。如需让 Compute Group
中各 BE 承载的 tablet 更加均匀,可调小该配置值。
+
+2. **通过 FE metrics 观察**:查看 FE metrics 中的 `doris_fe_cloud_.*_balance_num`
系列指标,若长时间无变化,说明 Compute Group 已趋于均衡状态。建议在监控面板上配置这些 metrics,便于持续观察和判断。
+ ```bash
+ curl "http://feip:fe_http_port/metrics" | grep '_balance_num'
+ ```
+
## 重命名计算组
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]