(doris-website) branch master updated: [doc](cloud) Update deployment and file cache (#642)

luzhijing Tue, 14 May 2024 08:11:04 -0700

This is an automated email from the ASF dual-hosted git repository.

luzhijing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new b0d6caf0a00 [doc](cloud) Update deployment and file cache (#642)
b0d6caf0a00 is described below

commit b0d6caf0a0007856ca3386c17210099cc7183799
Author: Gavin Chou <[email protected]>
AuthorDate: Tue May 14 23:00:20 2024 +0800

    [doc](cloud) Update deployment and file cache (#642)
---
 .../deployment.md                                  | 345 ++++++++++++++++-----
 .../file-cache.md                                  | 101 ++++++
 .../install-fdb.md                                 |  21 +-
 .../storage-vault.md                               |   2 +-
 4 files changed, 387 insertions(+), 82 deletions(-)

diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/separation-of-storage-and-compute/deployment.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/separation-of-storage-and-compute/deployment.md
index 80bbeec87be..f93e678308b 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/separation-of-storage-and-compute/deployment.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/separation-of-storage-and-compute/deployment.md
@@ -28,21 +28,26 @@ under the License.
 
 开始部署前请先阅读[Doris 存算分离架构说明文档](../separation-of-storage-and-compute/overview.md).
 
-Doris 存算分离部署总共需要 3 个模块：FE BE MS(程序名为 doris_cloud, 存算分离新引入的模块)
+存算分离Doris部署总共需要 3 个模块：FE BE MS(程序名为 doris_cloud, 存算分离模式新增模块)
 
-ms 模块程序启动有两个角色，通过启动参数确定它的角色：
+存算分离Doris依赖两个额外的开源项目, 开始部署前请将这两个依赖提前安装好
+1. foundationdb (fdb), 安装文档参考[FDB 
安装文档](../separation-of-storage-and-compute/install-fdb.md)
+2. openjdk17, 需要安装到所有的节点上, 从这里获取安装 
<https://download.java.net/java/GA/jdk17.0.1/2a2082e5a09d4267845be086888add4f/12/GPL/openjdk-17.0.1_linux-x64_bin.tar.gz>
 
-1. meta-service 元数据管理
 
-2. Recycler 数据回
+ms 模块程序启动有两个角色，通过启动参数确定它的角色：
+1. Meta-service 元数据管理
+2. Recycler 数据回收
 
 ## 编译
 
+`--cloud` 是编译存算分离ms模块的参数
+
 ```bash
 sh build.sh --fe --be --cloud 
 ```
 
-相比存算一体 `output` 目录下多了一个 `ms` 目录产出
+相比存算一体 `output` 目录下多了一个 `ms` 目录产出.
 
 ```bash
 output
@@ -54,6 +59,17 @@ output
     └── lib
 ```
 
+ms 这个产出目录会提供给meta-service 以及 recycler 使用,
+需要注意的是虽然 Recycler 和 Meta-service 是同个程序，但是目前需要拷贝两份二进制文件。
+Recycler 和 Meta-service 两个目录完全一样，只是启动参数不同。
+
+使用以下命令拷贝ms得到一个Recycler工作目录`re`, 然后按需更改ms 以及 re目录下conf
+里的端口等配置即可.
+
+```shell
+cp -r ms re
+```
+
 ## 版本信息
 
 doris_cloud 检查版本有两个方式 一个是 `bin/start.sh --version`
@@ -71,7 +87,7 @@ Recycler 和 Meta-service 是同个程序的不同进程，通过启动参数来
 
 这两个进程依赖 FDB, FDB 的部署请参考[FDB 
安装文档](../separation-of-storage-and-compute/install-fdb.md)
 
-### 配置
+### meta-service 配置
 
 `./conf` 目录下有一个全部采用默认参数的配置文件 doris_cloud.conf (只需要一个配置文件)
 
@@ -81,8 +97,31 @@ Recycler 和 Meta-service 是同个程序的不同进程，通过启动参数来
 brpc_listen_port = 5000
 fdb_cluster = xxx:[email protected]:4500
 ```
+上述端口`brpc_listen_port` 5000是meta-service的默认端口
+其中 `fdb_cluster` 的值是 FDB 集群的连接信息，找部署 FDB 的同学获取，一般可以在部署fdb的机器上 
/etc/foundationdb/fdb.cluster 文件找到其内容。
+
+```shell
+cat /etc/foundationdb/fdb.cluster
+
+## DO NOT EDIT!
+## This file is auto-generated, it is not to be edited by hand
+cloud_ssb:[email protected]:4500
+```
 
-其中 fdb_cluster 的值是 FDB 集群的连接信息，找部署 FDB 的同学获取，一般可以在 
/etc/foundationdb/FDB.cluster 文件找到其内容。(只需要标红高亮那那行), 如果开发机没有 FDB 的话就摇人要一个。
+### Recycler 配置
+
+**Recycler 除了端口之外其他默认配置和meta-service均相同, brpc端口和meta-service不相同即可, 一般采用5100.**
+
+`./conf` 目录下有一个全部采用默认参数的配置文件 doris_cloud.conf (只需要一个配置文件)
+
+一般需要改的是 `brpc_listen_port` 和 `fdb_cluster` 这两个参数
+
+```shell
+brpc_listen_port = 5100
+fdb_cluster = xxx:[email protected]:4500
+```
+上述端口`brpc_listen_port` 5000是meta-service的默认端口
+其中 `fdb_cluster` 的值是 FDB 集群的连接信息，找部署 FDB 的同学获取，一般可以在部署fdb的机器上 
/etc/foundationdb/fdb.cluster 文件找到其内容。
 
 ```shell
 cat /etc/foundationdb/fdb.cluster
@@ -94,26 +133,30 @@ cloud_ssb:[email protected]:4500
 
 ### 模块启停
 
-doris_cloud 在部署的 bin 目录下也有启停脚本
+Meta-Service 和 Recycler 依赖JAVA的运行环境, 最好使用jdk-17的版本.
+启动前export JAVA_HOME的环境变量.
+
+doris_cloud 在部署的 bin 目录下有启停脚本, 调用对应的启停脚本即可完成启动.
 
 ### 启停 meta_service
 
 ```shell
+export JAVA_HOME=${path_to_jdk_17}
 bin/start.sh --meta-service --daemonized
 
 bin/stop.sh
 ```
 
+
 ### 启停 Recycler
 
 ```shell
+export JAVA_HOME=${path_to_jdk_17}
 bin/start.sh --recycler --daemonized
 
 bin/stop.sh
 ```
 
-需要注意的是虽然 Recycler 和 Meta-service 是同个程序，但是目前需要拷贝两份二进制文件。Recycler 和 Meta-service 
两个目录完全一样，只是启动参数不同。
-
 ## 创建存算分离集群
 
 存算分离架构下，整个数仓的节点构成信息是通过 Meta-service 进行维护的 (注册 + 变更). FE BE 和 Meta-service 
交互来进行服务发现和身份验证。
@@ -124,6 +167,11 @@ bin/stop.sh
 
 主要分为两步: 1. 注册一个仓库(FE) 2. 注册一个或者多个计集群(BE)
 
+
+注意:
+1. **本文后续的示例中127.0.0.1:5000指的是meta-service的地址, 实际操作时替换成真实的meta-serivce 的IP 
以及brpc 监听端口**
+2. 请勿直接复制粘贴
+
 ### 创建存算分离集群FE
 
 #### 存算分离集群及其存储后端
@@ -132,7 +180,7 @@ bin/stop.sh
 主要需要描述一个仓库使用什么样的存储后端([Storage 
Vault](../separation-of-storage-and-compute/storage-vault.md)), 可以选择S3 或者 HDFS. 
 
 调用meta-servicde的create_instance接口. 主要参数 
-1. instance_id: 存算分离模式Doris数仓的id, 要求历史唯一uuid, 
例如6ADDF03D-4C71-4F43-9D84-5FC89B3514F8. **本文档中为了简化使用普通字符串**.
+1. instance_id: 存算分离模式Doris数仓的id, 每次新建都使用一个新的, 要求历史唯一, 一般使用uuid, 
例如6ADDF03D-4C71-4F43-9D84-5FC89B3514F8. **本文档中为了简化使用普通字符串**.
 2. name: 数仓名字, 按需填写
 3. user_id: 用户id, 是一个字符串, 按需填写
 4. vault: HDFS或者S3的存储后端的信息, 比如HDFS的属性, s3 bucket信息等.
@@ -141,50 +189,62 @@ bin/stop.sh
 
 ##### 创建基于HDFS的存算分离Doris
 
-示例
+创建基于HDFS的存算分离Doris, 需要描述正确所有信息以及保证所有的节点(FE BE MS RE)
+都能够有权限访问所声明的HDFS, 例如提前给机器做好kerbros授权配置等, 并做好连通性
+检查(可以在对应的每个节点上使用hadoop client进行测试).
+
+prefix 字段按需填写, 一般以数仓的服务业务来命名.
 
+示例
 ```Shell
-curl -s 
"${META_SERVICE_ENDPOINT}/MetaService/http/create_instance?token=greedisgood9999"
 \
-                        -d '{
-  "instance_id": "doris_master_asan_hdfs_multi_cluster_autoscale",
-  "name": "doris_master_asan_hdfs_multi_cluster_autoscale",
-  "user_id": "sample-user-id",
+curl -s 
"127.0.0.1:5000/MetaService/http/create_instance?token=greedisgood9999" -d \
+'{
+  "instance_id": "sample_instance_id",
+  "name": "sample_instance_name",
+  "user_id": "sample_user_id",
   "vault": {
     "hdfs_info" : {
       "build_conf": {
         "fs_name": "hdfs://172.21.0.44:4007",
         "user": "hadoop",
         "hdfs_kerberos_keytab": "/etc/emr.keytab",
-        "hdfs_kerberos_principal": "hadoop/172.30.0.178@EMR-D46OBYMH",
+        "hdfs_kerberos_principal": "hadoop/172.30.0.178@EMR-XXXYYY",
         "hdfs_confs" : [
-                  {
-                    "key": "hadoop.security.authentication",
-                    "value": "kerberos"
-                  }
-                ]
+          {
+            "key": "hadoop.security.authentication",
+            "value": "kerberos"
+          }
+        ]
       },
-      "prefix": "doris_master_asan_hdfs_multi_cluster_autoscale-0404"
+      "prefix": "sample_prefix"
     }
   }
 }'
 ```
 
-##### 创建基于Se的存算分离Doris
+##### 创建基于S3的存算分离Doris
 
-示例(腾讯云的cos)
+基于对象存储的所有属性均为必填, 其中
+* 使用minio等支持S3协议的对象存储时, 需要自行测试连通性以及aksk的正确性.
+       具体做法可以参考 [使用aws cli 
验证minio是否工作](https://min.io/docs/minio/linux/integrations/aws-cli-with-minio.html)
+       这个教程进行检查
+* bucket字段的值就是一个bucket的名字, 是不带scheme(例如s3://)的.
+* external_endpoint保持和endpoint值相同即可.
+* 如果不是云厂商提供的对象存储, region 和 provider 的值可以任意填写
 
+示例(腾讯云的cos)
 ```Shell
-curl -s 
"${META_SERVICE_ENDPOINT}/MetaService/http/create_instance?token=greedisgood9999"
 \
-                        -d '{
-  "instance_id": "doris_master_asan_hdfs_multi_cluster_autoscale",
-  "name": "doris_master_asan_hdfs_multi_cluster_autoscale",
-  "user_id": "sample-user-id",
+curl -s 
"127.0.0.1:5000/MetaService/http/create_instance?token=greedisgood9999" -d \
+'{
+  "instance_id": "sample_instance_id",
+  "name": "sample_instance_name",
+  "user_id": "sample_user_id",
   "vault": {
     "obj_info": {
-      "ak": "${ak}",
-      "sk": "${sk}",
-      "bucket": "doris-build-1308700295",
-      "prefix": "${your_prefix}",
+      "ak": "ak_xxxxxxxxxxx",
+      "sk": "sk_xxxxxxxxxxx",
+      "bucket": "sample_bucket_name",
+      "prefix": "sample_prefix",
       "endpoint": "cos.ap-beijing.myqcloud.com",
       "external_endpoint": "cos.ap-beijing.myqcloud.com",
       "region": "ap-beijing",
@@ -194,24 +254,23 @@ curl -s 
"${META_SERVICE_ENDPOINT}/MetaService/http/create_instance?token=greedis
 }'
 ```
 
-启动后在FE输入show storage vault会看到built_in_storage_vault,并且这个vault的属性就和刚刚传递的属性相同.
+##### 查看存储后端
+
+执行完后续步骤, 启动FE成功后, 可以在FE输入SQL show storage vault
+可以看到built_in_storage_vault, 并且这个vault的属性上述的属性值相同.
+
+以下为hdfs的一个示例
 
 ```Shell
 mysql> show storage vault;
 
+------------------------+----------------+-------------------------------------------------------------------------------------------------+-----------+
 | StorageVaultName       | StorageVaultId | Propeties                          
                                                             | IsDefault |
 
+------------------------+----------------+-------------------------------------------------------------------------------------------------+-----------+
-| built_in_storage_vault | 1              | build_conf { fs_name: 
"hdfs://127.0.0.1:8020" } prefix: "_1CF80628-16CF-0A46-54EE-2C4A54AB1519" | 
false     |
+| built_in_storage_vault | 1              | build_conf { fs_name: 
"hdfs://127.0.0.1:8020" } prefix: "sample_prefix_1CF80628-16CF-0A46-5EE2" | 
false     |
 
+------------------------+----------------+-------------------------------------------------------------------------------------------------+-----------+
 2 rows in set (0.00 sec)
 ```
 
-**注意：**
-
-Storage 
Vault模式和非Vault模式是不能同时创建的，如果用户同时指定了obj_info和vault，那么只会创建非vault模式的集群。Vault模式必须在创建instance的时候就传递vault信息，否则会默认为非vault模式.
-
-只有Vault模式才支持对应的vault stmt.
-
 #### 添加FE
 
 存算分离模式FE的管理方式和BE 是类似的都是分了组, 所以也是通过add_cluster等接口来进行操作.
@@ -227,14 +286,14 @@ ip edit_log_port 按照fe.conf里实际填写.
 ```Shell
 # 添加FE
 curl '127.0.0.1:5000/MetaService/http/add_cluster?token=greedisgood9999' -d '{
-    "instance_id":"cloud_instance0",
+    "instance_id":"sample_instance_id",
     "cluster":{
         "type":"SQL",
         "cluster_name":"RESERVED_CLUSTER_NAME_FOR_SQL_SERVER",
         "cluster_id":"RESERVED_CLUSTER_ID_FOR_SQL_SERVER",
         "nodes":[
             {
-                
"cloud_unique_id":"1:cloud_instance0:cloud_unique_id_sql_server00",
+                
"cloud_unique_id":"1:sample_instance_id:cloud_unique_id_sql_server00",
                 "ip":"172.21.16.21",
                 "edit_log_port":12103,
                 "node_type":"FE_MASTER"
@@ -245,14 +304,14 @@ curl 
'127.0.0.1:5000/MetaService/http/add_cluster?token=greedisgood9999' -d '{
 
 # 创建成功 get 出来确认一下
 curl '127.0.0.1:5000/MetaService/http/get_cluster?token=greedisgood9999' -d '{
-    "instance_id":"cloud_instance0",
-    "cloud_unique_id":"1:cloud_instance0:regression-cloud-unique-id-fe-1",
+    "instance_id":"sample_instance_id",
+    "cloud_unique_id":"1:sample_instance_id:cloud_unique_id_sql_server00",
     "cluster_name":"RESERVED_CLUSTER_NAME_FOR_SQL_SERVER",
     "cluster_id":"RESERVED_CLUSTER_ID_FOR_SQL_SERVER"
 }'
 ```
 
-### 创建compute cluster (BE)
+### [创建计算集群(Compute Cluster -- BE)](id:create_compute_cluster)
 
 用户可以创建一个或者多个计算集群, 一个计算机群由任意多个计算阶段组成.
 
@@ -268,14 +327,14 @@ BE cluster的数量以及 节点数量 根据自己需求调整, 不固定, 不
 # 172.19.0.11
 # 添加BE
 curl '127.0.0.1:5000/MetaService/http/add_cluster?token=greedisgood9999' -d '{
-    "instance_id":"cloud_instance0",
+    "instance_id":"sample_instance_id",
     "cluster":{
         "type":"COMPUTE",
         "cluster_name":"cluster_name0",
         "cluster_id":"cluster_id0",
         "nodes":[
             {
-                
"cloud_unique_id":"1:cloud_instance0:cloud_unique_id_compute_node0",
+                
"cloud_unique_id":"1:sample_instance_id:cloud_unique_id_compute_node0",
                 "ip":"172.21.16.21",
                 "heartbeat_port":9455
             }
@@ -285,43 +344,44 @@ curl 
'127.0.0.1:5000/MetaService/http/add_cluster?token=greedisgood9999' -d '{
 
 # 创建成功 get 出来确认一下
 curl '127.0.0.1:5000/MetaService/http/get_cluster?token=greedisgood9999' -d '{
-    "instance_id":"cloud_instance0",
-    "cloud_unique_id":"1:cloud_instance0:regression-cloud-unique-id0",
-    "cluster_name":"regression_test_cluster_name0",
-    "cluster_id":"regression_test_cluster_id0"
+    "instance_id":"sample_instance_id",
+    "cloud_unique_id":"1:sample_instance_id:cloud_unique_id_compute_node0",
+    "cluster_name":"cluster_name0",
+    "cluster_id":"cluster_id0"
 }'
 ```
 
-### 计算集群操作
-
-TBD
-
-加减节点: FE BE 
-
-Drop cluster 
-
 ### FE/BE配置
 
-FE BE 配置相比doris多了一些配置, 一个是meta service 的地址另外一个是 cloud_unique_id (根据之前创建存算分离集群 
的时候实际值填写)
+FE BE 配置相比doris多了一些配置, 其中比较关键的是
+* meta_service_endpoint, 这个配置是meta service的地址, FE BE都要填写
+* cloud_unique_id, 根据之前创建存算分离集群的时候发往meta-service请求里的实际值填写即可, 
Doris是通过这个配置是否有值来决定是否工作在存算分离模式
 
-fe.conf
+
+#### fe.conf
 
 ```Shell
-# cloud HTTP data api port
-cloud_http_port = 8904
 meta_service_endpoint = 127.0.0.1:5000
-cloud_unique_id = 1:cloud_instance0:cloud_unique_id_sql_server00
+cloud_unique_id = 1:sample_instance_id:cloud_unique_id_sql_server00
 ```
 
-be.conf
+#### be.conf
+
+下述例子中 meta_service_use_load_balancer 和 enable_file_cache 可以照抄, 
+其他的配置按照实际值填写.
+
+file_cache_path 是一个json数组(根据实际cache盘的个数配置), 它的各个字段含义
+* path, 缓存数据存放的路径, 类似于存算一体的storage_root_path
+* total_size, 期望使用的缓存空间上限
+* query_limit, 单个query在cache miss时最多能淘汰的缓存数据量 (为了防止大查询把缓存全部冲掉)
+cache中存放的是数据, 所以最好使用SSD等高性能的磁盘作为缓存存储.
 
 ```Shell
 meta_service_endpoint = 127.0.0.1:5000
-cloud_unique_id = 1:cloud_instance0:cloud_unique_id_compute_node0
+cloud_unique_id = 1:sample_instance_id:cloud_unique_id_compute_node0
 meta_service_use_load_balancer = false
 enable_file_cache = true
-file_cache_path = 
[{"path":"/mnt/disk3/doris_cloud/file_cache","total_size":104857600,"query_limit":104857600}]
-tmp_file_dirs = 
[{"path":"/mnt/disk3/doris_cloud/tmp","max_cache_bytes":104857600,"max_upload_bytes":104857600}]
+file_cache_path = 
[{"path":"/mnt/disk1/doris_cloud/file_cache","total_size":104857600000,"query_limit":10485760000},
 
{"path":"/mnt/disk2/doris_cloud/file_cache","total_size":104857600000,"query_limit":10485760000}]
 ```
 
 ### 启停FE/BE
@@ -339,5 +399,146 @@ bin/stop_fe.sh
 
 Doris **cloud模式****FE****会自动发现对应的BE, 不需通过 alter system add 或者drop backend 
操作节点.**
 
-启动后观察日志.
+启动后观察日志, 如果上述缓解配置都是正确的, 则进入正常工作模式, 可以通过MySQL客户端连上FE进行访问.
+
+### 计算集群操作
+
+#### 加减FE/BE节点
+加减节点的操作和上述新建存算分离Doris时创建计算集群的步骤类似, 向meta-service 声
+明需要增加哪些节点, 然后启动对应的节点即可(新增节点的要配置正确),
+**不需要使用alter system add/drop语句进行额外操作**.
+
+存算分离模式可以同时增加/减少若干个节点, 但是建议实际操作时每次只操作操作一个节点.
+
+示例, 给计算集群cluster_name0增加两个BE节点
+```
+curl '127.0.0.1:5000/MetaService/http/add_node?token=greedisgood9999' -d '{
+    "instance_id":"sample_instance_id",
+    "cluster":{
+        "type":"COMPUTE",
+        "cluster_name":"cluster_name0",
+        "cluster_id":"cluster_id0",
+        "nodes":[
+            {
+                
"cloud_unique_id":"1:sample_instance_id:cloud_unique_id_compute_node1",
+                "ip":"172.21.16.22",
+                "heartbeat_port":9455
+            },
+            {
+                
"cloud_unique_id":"1:sample_instance_id:cloud_unique_id_compute_node2",
+                "ip":"172.21.16.23",
+                "heartbeat_port":9455
+            }
+        ]
+    }
+}'
+```
+
+示例, 给计算集群cluster_name0减少两个BE节点
+```
+curl '127.0.0.1:5000/MetaService/http/drop_node?token=greedisgood9999' -d '{
+    "instance_id":"sample_instance_id",
+    "cluster":{
+        "type":"COMPUTE",
+        "cluster_name":"cluster_name0",
+        "cluster_id":"cluster_id0",
+        "nodes":[
+            {
+                
"cloud_unique_id":"1:sample_instance_id:cloud_unique_id_compute_node1",
+                "ip":"172.21.16.22",
+                "heartbeat_port":9455
+            },
+            {
+                
"cloud_unique_id":"1:sample_instance_id:cloud_unique_id_compute_node2",
+                "ip":"172.21.16.23",
+                "heartbeat_port":9455
+            }
+        ]
+    }
+}'
+```
+
+示例, 添加一个FE follower, 以下node_type为FE_MASTER表示这个节点可以选为master,
+**如果需要增加一个OBSERVER, 将node_type 设置为OBSERVER即可.**
+```
+curl '127.0.0.1:5000/MetaService/http/add_node?token=greedisgood9999' -d '{
+    "instance_id":"sample_instance_id",
+    "cluster":{
+        "type":"SQL",
+        "cluster_name":"RESERVED_CLUSTER_NAME_FOR_SQL_SERVER",
+        "cluster_id":"RESERVED_CLUSTER_ID_FOR_SQL_SERVER",
+        "nodes":[
+            {
+                
"cloud_unique_id":"1:sample_instance_id:cloud_unique_id_sql_server00",
+                "ip":"172.21.16.22",
+                "edit_log_port":12103,
+                "node_type":"FE_MASTER"
+            }
+        ]
+    }
+}'
+```
+
+示例, 删除一个FE节点
+```
+curl '127.0.0.1:5000/MetaService/http/drop_node?token=greedisgood9999' -d '{
+    "instance_id":"sample_instance_id",
+    "cluster":{
+        "type":"SQL",
+        "cluster_name":"RESERVED_CLUSTER_NAME_FOR_SQL_SERVER",
+        "cluster_id":"RESERVED_CLUSTER_ID_FOR_SQL_SERVER",
+        "nodes":[
+            {
+                
"cloud_unique_id":"1:sample_instance_id:cloud_unique_id_sql_server00",
+                "ip":"172.21.16.22",
+                "edit_log_port":12103,
+                "node_type":"FE_MASTER"
+            }
+        ]
+    }
+}'
+```
+
+#### 加减计算集群(Compute Cluster)
+
+新增一个计算集群参考前文[创建计算集群章节](#create_compute_cluster)即可.
+
+删除一个计算集群调用meta-service 接口之后 关停响应节点即可.
+
+示例, 删除名为cluster_name0的计算集群(以下所有参数都必须填)
+```
+curl '127.0.0.1:5000/MetaService/http/add_cluster?token=greedisgood9999' -d '{
+    "instance_id":"sample_instance_id",
+    "cluster":{
+        "type":"COMPUTE",
+        "cluster_name":"cluster_name0",
+        "cluster_id":"cluster_id0"
+     }
+}'
+```
+
+## 清理集群(**正式环境请勿使用**)
+
+有时候我们需要创建一些测试用的存算分离Doris, 其中有一些步骤弄错了, 或者想完全重
+新搭建. 则需要清除环境, 重新执行上述创建存算分离的Doris的步骤.
+和存算一体的Doris 手动清除集群方式类似主要分为两步: 清除元数据以及清除数据
+
+1. 手动强制清除元数据: 删掉FE的meta目录以及fdb里的元数据, 删除fdb里的数据需要使
+        用到fdb的命令行工具fdbcli. 以下命令都要执行, 其中`${instance_id}`需要替成实际
+        的值
+       ```
+       fdbcli --exec "writemode on;clearrange 
\x01\x10instance\x00\x01\x10${instance_id}\x00\x01 
\x01\x10instance\x00\x01\x10${instance_id}\x00\xff\x00\x01"
+       fdbcli --exec "writemode on;clearrange 
\x01\x10meta\x00\x01\x10${instance_id}\x00\x01 
\x01\x10meta\x00\x01\x10${instance_id}\x00\xff\x00\x01"
+       fdbcli --exec "writemode on;clearrange 
\x01\x10txn\x00\x01\x10${instance_id}\x00\x01 
\x01\x10txn\x00\x01\x10${instance_id}\x00\xff\x00\x01"
+       fdbcli --exec "writemode on;clearrange 
\x01\x10version\x00\x01\x10${instance_id}\x00\x01 
\x01\x10version\x00\x01\x10${instance_id}\x00\xff\x00\x01"
+       fdbcli --exec "writemode on;clearrange 
\x01\x10stats\x00\x01\x10${instance_id}\x00\x01 
\x01\x10stats\x00\x01\x10${instance_id}\x00\xff\x00\x01"
+       fdbcli --exec "writemode on;clearrange 
\x01\x10recycle\x00\x01\x10${instance_id}\x00\x01 
\x01\x10recycle\x00\x01\x10${instance_id}\x00\xff\x00\x01"
+       fdbcli --exec "writemode on;clearrange 
\x01\x10job\x00\x01\x10${instance_id}\x00\x01 
\x01\x10job\x00\x01\x10${instance_id}\x00\xff\x00\x01"
+       fdbcli --exec "writemode on;clearrange 
\x01\x10copy\x00\x01\x10${instance_id}\x00\x01 
\x01\x10copy\x00\x01\x10${instance_id}\x00\xff\x00\x01"
+       fdbcli --exec "writemode on;clearrange 
\x01\x10storage_vault\x00\x01\x10${instance_id}\x00\x01 
\x01\x10storage_vault\x00\x01\x10${instance_id}\x00\xff\x00\x01"
+       ```
+2. 手动强制删除BE的缓存目录(主要是file_cache_path这个配置对应的几个目录)
+3. 重启meta-service 以及 recycler
+
+## 常见问题
 
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/separation-of-storage-and-compute/file-cache.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/separation-of-storage-and-compute/file-cache.md
index 52d806fbfe4..84b1b762fed 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/separation-of-storage-and-compute/file-cache.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/separation-of-storage-and-compute/file-cache.md
@@ -52,6 +52,16 @@ under the License.
 
 3. TTL 策略可以用在希望常驻本地的小表。对于常驻表，可以设置一个比较大的 TTL 值来让它在缓存里的数据不会被其他大表的查询淘汰掉。或者对于动态 
Partition 的表，可以根据 Partition 的 Hot Partition 的时间，设置对应的 TTL 值，来让 Hot Partition 
不会被 Cold Partition 的查询淘汰掉。目前暂不支持查看 TTL 数据在缓存的占比。
 
+   
+
+### 缓存预热
+
+存算分离Doris 提供多集群的能力，多个集群共享同一份数据，但不会共享同一份缓存。当创建新的 cluster 的时候，在新的 cluster 
的缓存是空的，这时候进行查询数据，会比较慢。这时候可以通过手段预热数据，主动从远端存储的数据拉起到本地缓存。 目前支持三种模式：
+
+- 指定集群 A 的缓存预热到集群 B 。在 存算分离Doris 
中，会定期采集每个集群一段时间内访问过的表或者分区的热点信息，然后作为内部表存储下来。当进行集群间的预热的时候，预热集群会根据源集群的热点信息来对某些表/分区进行预热。
+- 指定表 A 数据预热到新集群 。
+- 指定表 A 的分区 'p1' 的数据预热到新集群。
+
 
 
 ## 使用方法
@@ -91,6 +101,97 @@ ALTER TABLE customer set ("file_cache_ttl_seconds"="3000");
 
 如果建表的时候没有设置 TTL，也可以通过 ALTER 语句进行修改。
 
+
+
+### 缓存预热
+目前支持三种模式：
+- 指定 cluster_name0 的缓存预热到 cluster_name1 。
+查看当前warehouse下所有cluster的最频繁访问的表。
+```
+show cache hotspot '/';
++------------------------+-----------------------+----------------------------------------+
+| cluster_name           | total_file_cache_size | top_table_name              
           |
++------------------------+-----------------------+----------------------------------------+
+| cluster_name0          |          751620511367 | 
regression_test.doris_cache_hotspot    |
++------------------------+-----------------------+----------------------------------------+
+```
+
+查看 cluster_name0 下的所有 table 的最热 partition 信息
+```
+mysql> show cache hotspot '/cluster_name0';
++-----------------------------------------------------------+---------------------+--------------------+
+| table_name                                                | last_access_time 
   | top_partition_name |
++-----------------------------------------------------------+---------------------+--------------------+
+| regression_test.doris_cache_hotspot                       | 2023-05-29 
12:38:02 | p20230529          |
+| regression_test_cloud_load_copy_into_tpch_sf1_p1.customer | 2023-06-06 
10:56:12 | customer           |
+| regression_test_cloud_load_copy_into_tpch_sf1_p1.nation   | 2023-06-06 
10:56:12 | nation             |
+| regression_test_cloud_load_copy_into_tpch_sf1_p1.orders   | 2023-06-06 
10:56:12 | orders             |
+| regression_test_cloud_load_copy_into_tpch_sf1_p1.part     | 2023-06-06 
10:56:12 | part               |
+| regression_test_cloud_load_copy_into_tpch_sf1_p1.partsupp | 2023-06-06 
10:56:12 | partsupp           |
+| regression_test_cloud_load_copy_into_tpch_sf1_p1.region   | 2023-06-06 
10:56:12 | region             |
+| regression_test_cloud_load_copy_into_tpch_sf1_p1.supplier | 2023-06-06 
10:56:12 | supplier           |
++-----------------------------------------------------------+---------------------+--------------------+
+```
+
+查看 cluster_name0 下的 table 
regression_test_cloud_load_copy_into_tpch_sf1_p1.customer 的信息
+```
+show cache hotspot 
'/cluster_name0/regression_test_cloud_load_copy_into_tpch_sf1_p1.customer';
++----------------+---------------------+
+| partition_name | last_access_time    |
++----------------+---------------------+
+| supplier       | 2023-06-06 10:56:12 |
++----------------+---------------------+
+```
+
+当执行下面这条 SQL，cluster_name1 集群会获取到 cluster_name0 集群的访问信息，来尽可能还原出与 cluster_name0 
集群一样的缓存。
+```
+warm up cluster cluster_name1 with cluster cluster_name0
+```
+- 指定 customer 数据预热到 cluster_name1。执行下面的 SQL ，可以把该表在远端存储上的数据全拉取到本地。
+```
+warm up cluster cluster_name1 with table customer
+```
+- 指定 customer 的 partition 'p1' 的数据预热到 cluster_name1。执行下面的 SQL 
，可以把该分区在远端存储上的数据全拉取到本地。
+```
+warm up cluster cluster_name1 with table customer partition p1
+```
+以上三条SQL都会返回一个JobID的结果。例如
+```
+mysql> warm up cluster cloud_warm_up with table test_warm_up;
++-------+
+| JobId |
++-------+
+| 13418 |
++-------+
+1 row in set (0.01 sec)
+```
+然后通过下面的 SQL 来查看预热进度。
+```
+SHOW WARM UP JOB; // 获取job信息
+SHOW WARM UP JOB WHERE ID = 13418; // 指定job_id
++-------+-------------------+---------+-------+-------------------------+-------------+----------+------------+
+| JobId | ClusterName       | Status  | Type  | CreateTime              | 
FinishBatch | AllBatch | FinishTime |
++-------+-------------------+---------+-------+-------------------------+-------------+----------+------------+
+| 13418 | cloud_warm_up     | RUNNING | TABLE | 2023-05-30 20:19:34.059 | 0    
       | 1        | NULL       |
++-------+-------------------+---------+-------+-------------------------+-------------+----------+------------+
+1 row in set (0.02 sec)
+```
+根据 FinishBatch 和 All Batch 来判断当前任务进度，每个 Batch 约 10GB。
+目前一个 cluster 同一时间内只支持执行一个预热的JOB。也可以停止正在进行的预热job
+
+```
+mysql> cancel warm up job where id = 13418;
+Query OK, 0 rows affected (0.02 sec)
+
+mysql> show warm up job where id = 13418;
++-------+-------------------+-----------+-------+-------------------------+-------------+----------+-------------------------+
+| JobId | ClusterName       | Status    | Type  | CreateTime              | 
FinishBatch | AllBatch | FinishTime              |
++-------+-------------------+-----------+-------+-------------------------+-------------+----------+-------------------------+
+| 13418 | cloud_warm_up     | CANCELLED | TABLE | 2023-05-30 20:19:34.059 | 0  
         | 1        | 2023-05-30 20:27:14.186 |
++-------+-------------------+-----------+-------+-------------------------+-------------+----------+-------------------------+
+1 row in set (0.00 sec)
+```
+
 ## 实践案例
 
 某用户有若干张表，总数据量为 3TB+，缓存只有 1.2TB。其中经常访问的有两张表，其中有一张小表 200 MB，还有一张 100 GB 
的表。其他的大表也会每天有流量，但不多，在 LRU 
策略下，查询大表有可能淘汰掉需要经常访问的小表的数据，造成性能波动。为了让常驻的表的缓存不被淘汰，通过设置两张表的 TTL 
时间，让这两张表的数据常驻在缓存中。对于小表，因为数据量较大，每天变动不大，设置了 TTL 超时 1 
年来让他长期在缓存。对于另一张表，用户每天会做一次表的备份，然后再进行全量导入，所以推荐设置了 1 天的 TTL 超时时间。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/separation-of-storage-and-compute/install-fdb.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/separation-of-storage-and-compute/install-fdb.md
index 3b280d08775..e6b126814ba 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/separation-of-storage-and-compute/install-fdb.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/separation-of-storage-and-compute/install-fdb.md
@@ -33,17 +33,20 @@ under the License.
 
 ## 1. 安装
 
-每台机器都需先安装fdb服务。安装包下载地址 
https://github.com/apple/foundationdb/releases，选择一个版本，目前一般用的版本 
[7.1.38](https://github.com/apple/foundationdb/releases/tag/7.1.38)。
+每台机器都需先安装fdb服务。安装包下载地址 
<https://github.com/apple/foundationdb/releases>，选择一个版本，目前一般用的版本 
[7.1.38](https://github.com/apple/foundationdb/releases/tag/7.1.38)。
 
 一般关注centos(redhat) 和 ubuntu 的即可
-
-https://github.com/apple/foundationdb/releases/download/7.1.38/foundationdb-clients_7.1.38-1_amd64.deb
-
-https://github.com/apple/foundationdb/releases/download/7.1.38/foundationdb-server_7.1.38-1_amd64.deb
-
-https://github.com/apple/foundationdb/releases/download/7.1.38/foundationdb-clients-7.1.38-1.el7.x86_64.rpm
-
-https://github.com/apple/foundationdb/releases/download/7.1.38/foundationdb-server-7.1.38-1.el7.x86_64.rpm
+这里是原链接
+<https://github.com/apple/foundationdb/releases/download/7.1.38/foundationdb-clients-7.1.38-1.el7.x86_64.rpm>
+<https://github.com/apple/foundationdb/releases/download/7.1.38/foundationdb-server-7.1.38-1.el7.x86_64.rpm>
+<https://github.com/apple/foundationdb/releases/download/7.1.38/foundationdb-clients_7.1.38-1_amd64.deb>
+<https://github.com/apple/foundationdb/releases/download/7.1.38/foundationdb-server_7.1.38-1_amd64.deb>
+
+这里镜像
+<https://selectdb-doris-1308700295.cos.ap-beijing.myqcloud.com/toolkit/fdb/foundationdb-clients-7.1.38-1.el7.x86_64.rpm>
+<https://selectdb-doris-1308700295.cos.ap-beijing.myqcloud.com/toolkit/fdb/foundationdb-server-7.1.38-1.el7.x86_64.rpm>
+<https://selectdb-doris-1308700295.cos.ap-beijing.myqcloud.com/toolkit/fdb/foundationdb-server_7.1.38-1_amd64.deb>
+<https://selectdb-doris-1308700295.cos.ap-beijing.myqcloud.com/toolkit/fdb/foundationdb-clients_7.1.38-1_amd64.deb>
 
 安装fdb
 
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/separation-of-storage-and-compute/storage-vault.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/separation-of-storage-and-compute/storage-vault.md
index aeffcb13a47..180ae8252a8 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/separation-of-storage-and-compute/storage-vault.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/separation-of-storage-and-compute/storage-vault.md
@@ -61,7 +61,7 @@ CREATE STORAGE VAULT IF NOT EXISTS ssb_hdfs_vault
     PROPERTIES (
         "type"="hdfs", -- required
         "fs.defaultFS"="hdfs://127.0.0.1:8020", -- required
-        "path_prefix"="prefix", -- optional -> Gavin希望是required
+        "path_prefix"="prefix", -- optional
         "hadoop.username"="user" -- optional
         "hadoop.security.authentication"="kerberos" -- optional
         "hadoop.kerberos.principal"="hadoop/127.0.0.1@XXX" -- optional


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris-website) branch master updated: [doc](cloud) Update deployment and file cache (#642)

Reply via email to