(doris-website) branch master updated: [doc](cloud rebalance) Add a user doc for cloud reblance type (#3212)

dataroaring Tue, 30 Dec 2025 09:46:25 -0800

This is an automated email from the ASF dual-hosted git repository.

dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 192dfb06b55 [doc](cloud rebalance) Add a user doc for cloud reblance 
type (#3212)
192dfb06b55 is described below

commit 192dfb06b554dd901c4ff834a45b6ccea0590353
Author: deardeng <[email protected]>
AuthorDate: Wed Dec 31 01:45:47 2025 +0800

    [doc](cloud rebalance) Add a user doc for cloud reblance type (#3212)
    
    ## Versions
    
    - [x] dev
    - [ ] 4.x
    - [ ] 3.x
    - [ ] 2.1
    
    ## Languages
    
    - [ ] Chinese
    - [ ] English
    
    ## Docs Checklist
    
    - [ ] Checked by AI
    - [ ] Test Cases Built
---
 .../managing-compute-cluster.md                    | 72 ++++++++++++++++++++++
 .../managing-compute-cluster.md                    | 70 +++++++++++++++++++++
 2 files changed, 142 insertions(+)

diff --git a/docs/compute-storage-decoupled/managing-compute-cluster.md 
b/docs/compute-storage-decoupled/managing-compute-cluster.md
index 94db1fa0158..39f6e17ddd0 100644
--- a/docs/compute-storage-decoupled/managing-compute-cluster.md
+++ b/docs/compute-storage-decoupled/managing-compute-cluster.md
@@ -144,6 +144,78 @@ If the database or compute group name contains reserved 
keywords, the correspond
 
 You can scale compute groups by adding or removing BE using `ALTER SYSTEM ADD 
BACKEND` and `ALTER SYSTEM DECOMMISION BACKEND`.
 
+### Load Rebalancing After Scaling
+
+Cloud rebalance is a load balancing operation in Doris's compute-storage 
decoupled architecture. It is used to rebalance read and write traffic across 
the compute group after scaling (adding or removing) backend nodes in different 
Compute Groups. A node that has been offline for an extended period is 
considered as removed.
+
+#### Balance Strategy Types
+
+:::caution
+
+The `balance_type` feature is supported starting from Doris 3.1.3 and Doris 
4.0.2.  
+Prior to these versions, only the FE global configuration 
`enable_cloud_warm_up_for_rebalance` was available to control whether warm up 
tasks are executed during rebalance.
+
+:::
+
+The following table describes three strategy types, using the example of 
adding nodes to a Compute Group:
+
+| Type | Time to Service | Performance Fluctuation | Technical Principle | Use 
Cases |
+| :--- | :---: | :---: | :-- | :-- |
+  | `without_warmup` | Fastest | Highest fluctuation | FE directly modifies 
shard mapping; first read/write has no file cache and needs to fetch from S3 | 
Scenarios requiring quick node deployment with low sensitivity to performance 
jitter |
+| `async_warmup` | Faster | Possible cache miss | Issues warm up tasks, 
modifies mapping after success or timeout; attempts to pull file cache to new 
BE during mapping switch, some scenarios may still miss on first read | General 
scenarios with acceptable performance |
+| `sync_warmup` | Slower | Minimal cache miss | Issues warm up tasks, FE 
modifies mapping only after task completion, ensuring cache migration | 
Scenarios with extremely high performance requirements after scaling, requiring 
file cache to exist on new nodes |
+
+#### User Interface
+
+##### Global Default Balance Type
+
+Set the global default value through FE configuration (fe.conf):
+
+```
+cloud_default_rebalance_type = "async_warmup"
+```
+
+##### Compute Group-Level Configuration
+
+You can configure balance type for each Compute Group separately:
+
+```sql
+ALTER COMPUTE GROUP cg1 PROPERTIES("balance_type"="async_warmup");
+```
+
+##### Configuration Rules
+
+1. If a Compute Group does not have `balance_type` configured, it uses the 
global default value `async_warmup`.
+2. If a Compute Group has `balance_type` configured, that configuration takes 
priority during rebalance.
+
+#### FAQ
+
+##### How to View and Modify Global Rebalance Type?
+
+- **View**: Execute `ADMIN SHOW FRONTEND CONFIG LIKE 
"cloud_default_rebalance_type";`
+- **Modify**: Execute `ADMIN SET FRONTEND CONFIG 
("cloud_warm_up_for_rebalance_type" = "without_warmup");` (takes effect without 
restarting FE)
+
+##### How to Query Compute Group Balance Type?
+
+Execute `SHOW COMPUTE GROUPS;`. The `properties` column in the result contains 
Compute Group attribute information, including the `balance_type` configuration.
+
+##### How to Determine if the Cluster is in a Stable Tablet State?
+
+1. **Check via `SHOW BACKENDS`**: Check if the tablet counts across BEs are 
close. Reference calculation range:  
+   ```
+   (Total tablets in cluster / Compute Group BE count) * 0.95 
+   ~ 
+   (Total tablets in cluster / Compute Group BE count) * 1.05
+   ```  
+   The value 0.05 is the default value of the FE configuration 
`cloud_rebalance_percent_threshold`. To make tablet distribution more uniform 
across BEs in the Compute Group, you can reduce this configuration value.
+
+2. **Observe via FE Metrics**: Check the `doris_fe_cloud_.*_balance_num` 
series of metrics in FE metrics. If there is no change for an extended period, 
it indicates the Compute Group has reached a balanced state. It is recommended 
to configure these metrics on a monitoring dashboard for continuous observation 
and judgment.  
+   ```bash
+   curl "http://feip:fe_http_port/metrics"; | grep '_balance_num'
+   ```
+
+
+
 ## Renaming Compute Group
 
 You can use the `ALTER SYSTEM RENAME COMPUTE GROUP <old_name> <new_name>` 
command to rename an existing compute group. Please refer to the SQL Manual on 
[Renaming Compute 
Groups](../sql-manual/sql-statements/cluster-management/instance-management/ALTER-SYSTEM-RENAME-COMPUTE-GROUP).
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/managing-compute-cluster.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/managing-compute-cluster.md
index 9bee2822a91..f8322adb352 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/managing-compute-cluster.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/managing-compute-cluster.md
@@ -143,6 +143,76 @@ USE { [catalog_name.]database_name[@compute_group_name] | 
@compute_group_name }
 
 通过 `ALTER SYSTEM ADD BACKEND` 以及 `ALTER SYSTEM DECOMMISION BACKEND` 添加或者删除 BE 
实现计算组的扩缩容。
 
+### 计算组扩缩容后负载重均衡
+
+Cloud rebalance 是 Doris 在存算分离架构下，当不同 Compute Group 
中的后端节点（Backend）发生扩缩容（长时间节点下线视为缩容）后，用于重新均衡集群读写流量的负载均衡操作。
+
+#### Balance 策略类型
+
+:::caution
+
+`balance_type` 功能自 Doris 3.1.3 和 Doris 4.0.2 版本起支持。  
+在此之前，仅支持通过 FE 全局配置 `enable_cloud_warm_up_for_rebalance` 来控制 rebalance 时是否执行 
warm up 任务。
+
+:::
+
+以下以向 Compute Group 扩容节点为例，说明三种策略类型：
+
+| 类型 | 新节点可服务时间 | 性能波动 | 技术原理 | 适用场景 |
+| :--- | :---: | :---: | :-- | :-- |
+| `without_warmup` | 最快 | 性能波动最大 | FE 直接修改分片映射；首次读写无 file cache，需从 S3 拉取数据 | 
需要新节点快速上线，对性能抖动不敏感的场景 |
+| `async_warmup` | 较快 | 可能出现 cache miss | 下发 warm up 任务，成功或超时后再修改映射；在映射切换时尽力拉取 
file cache 到新 BE，部分场景首次读仍可能 miss | 通用场景，性能可接受 |
+| `sync_warmup` | 较慢 | 基本无 cache miss | 下发 warm up 任务，FE 确认任务完成后才修改映射，确保 cache 
迁移完成 | 对扩容后性能要求极高，希望新节点上一定存在 file cache 的场景 |
+
+#### 用户接口
+
+##### 全局默认 balance type
+
+通过 FE 配置项(fe.conf)设置全局默认值：
+
+```
+cloud_default_rebalance_type = "async_warmup"
+```
+
+##### Compute Group 维度配置
+
+支持为每个 Compute Group 单独配置 balance type：
+
+```sql
+ALTER COMPUTE GROUP cg1 PROPERTIES("balance_type"="async_warmup");
+```
+
+##### 配置规则
+
+1. 若 Compute Group 未配置 `balance_type`，则使用全局默认值 `async_warmup`。
+2. 若 Compute Group 已配置 `balance_type`，则执行 rebalance 时优先使用该 Compute Group 的配置。
+
+#### FAQ
+
+##### 如何查看与修改全局 rebalance type？
+
+- **查看**：执行 `ADMIN SHOW FRONTEND CONFIG LIKE "cloud_default_rebalance_type";`
+- **修改**：执行 `ADMIN SET FRONTEND CONFIG ("cloud_warm_up_for_rebalance_type" = 
"without_warmup");`（修改后无需重启 FE 即可生效）
+
+##### 如何查询 Compute Group 的 balance type？
+
+执行 `SHOW COMPUTE GROUPS;`，结果中的 `properties` 列包含 Compute Group 的属性信息，其中可查看 
`balance_type` 配置。
+
+##### 如何判断集群是否处于 tablet 稳定态？
+
+1. **通过 `SHOW BACKENDS` 查看**：检查各 BE 的 tablet 数是否接近。计算方法参考范围：  
+   ```
+   (集群所有 tablet 数 / Compute Group BE 数) * 0.95 
+   ~ 
+   (集群所有 tablet 数 / Compute Group BE 数) * 1.05
+   ```  
+   其中 0.05 为 FE 配置项 `cloud_rebalance_percent_threshold` 的默认值。如需让 Compute Group 
中各 BE 承载的 tablet 更加均匀，可调小该配置值。
+
+2. **通过 FE metrics 观察**：查看 FE metrics 中的 `doris_fe_cloud_.*_balance_num` 
系列指标，若长时间无变化，说明 Compute Group 已趋于均衡状态。建议在监控面板上配置这些 metrics，便于持续观察和判断。  
+   ```bash
+   curl "http://feip:fe_http_port/metrics"; | grep '_balance_num'
+   ```
+
 
 ## 重命名计算组
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris-website) branch master updated: [doc](cloud rebalance) Add a user doc for cloud reblance type (#3212)

Reply via email to