(doris-website) branch master updated: [Doc](remote storage)Modify remote storage doc (#1304)

dataroaring Wed, 06 Nov 2024 22:38:48 -0800

This is an automated email from the ASF dual-hosted git repository.

dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 62447f1ddb [Doc](remote storage)Modify remote storage doc (#1304)
62447f1ddb is described below

commit 62447f1ddbc518662df093dd0769db4e745c2b89
Author: abmdocrt <[email protected]>
AuthorDate: Thu Nov 7 14:38:33 2024 +0800

    [Doc](remote storage)Modify remote storage doc (#1304)
    
    # Versions
    
    - [x] dev
    - [x] 3.0
    - [x] 2.1
    - [ ] 2.0
    
    # Languages
    
    - [x] Chinese
    - [x] English
    
    ---------
    
    Co-authored-by: Yukang-Lian <[email protected]>
---
 docs/table-design/tiered-storage/remote-storage.md | 188 +++++++++------------
 .../table-design/tiered-storage/remote-storage.md  | 102 +++++------
 .../table-design/tiered-storage/remote-storage.md  | 102 +++++------
 .../table-design/tiered-storage/remote-storage.md  | 102 +++++------
 .../table-design/tiered-storage/remote-storage.md  | 188 +++++++++------------
 .../table-design/tiered-storage/remote-storage.md  | 188 +++++++++------------
 6 files changed, 366 insertions(+), 504 deletions(-)

diff --git a/docs/table-design/tiered-storage/remote-storage.md 
b/docs/table-design/tiered-storage/remote-storage.md
index ca986f86d6..1380d1dcff 100644
--- a/docs/table-design/tiered-storage/remote-storage.md
+++ b/docs/table-design/tiered-storage/remote-storage.md
@@ -24,47 +24,17 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## Use Case
+### Feature Overview
 
-One significant use case in the future is similar to ES log storage, where 
data in the log scenario is split based on dates. Many of the data are cold 
data with infrequent queries, requiring a reduction in storage costs for such 
data. Considering cost-saving:
+Remote storage supports placing some data in external storage (such as object 
storage or HDFS), which saves costs without sacrificing functionality.
 
-- The pricing of regular cloud disks from various vendors is more expensive 
than object storage.
-
-- In actual online usage of the Doris Cluster, the utilization of regular 
cloud disks cannot reach 100%.
-
-- Cloud disks are not billed on demand, while object storage can be billed on 
demand.
-
-- Using regular cloud disks for high availability requires multiple replicas 
and replica migration in case of failures. In contrast, storing data on object 
storage eliminates these issues as it is shared.
-
-## Solution
-
-Set the freeze time at the partition level, which indicates how long a 
partition will be frozen, and define the location of remote storage for storing 
data after freezing. In the BE (Backend) daemon thread, the table's freeze 
condition is periodically checked. If a freeze condition is met, the data will 
be uploaded to object storage compatible with the S3 protocol and HDFS.
-
-Cold-hot tiering supports all Doris functionalities and only moves some data 
to object storage to save costs without sacrificing functionality. Therefore, 
it has the following characteristics:
-
-- Cold data is stored on object storage, and users do not need to worry about 
data consistency and security.
-
-- Flexible freeze strategy, where the cold remote storage property can be 
applied to both table and partition levels.
-
-- Users can query data without worrying about data distribution. If the data 
is not local, it will be pulled from the object storage and cached locally in 
the BE (Backend).
-
-- Replica clone optimization. If the stored data is on object storage, there 
is no need to fetch the stored data locally during replica cloning.
-
-- Remote object space recycling. If a table or partition is deleted or if 
space waste occurs during the cold-hot tiering process due to exceptional 
situations, a recycler thread will periodically recycle the space, saving 
storage resources.
-
-- Cache optimization, caching accessed cold data locally in the BE to achieve 
query performance similar to non-cold-hot tiering.
-
-- BE thread pool optimization, distinguishing between data sources from local 
and object storage to prevent delays in reading objects from impacting query 
performance.
-
-## Usage of Storage Policy
-
-The storage policy is the entry point for using the cold-hot tiering feature. 
Users only need to associate the storage policy with a table or partition 
during table creation or when using Doris.
-
-:::tip
-When creating an S3 resource, a remote S3 connection validation is performed 
to ensure the correct creation of the resource.
+:::warning Note
+Data in remote storage only has one replica. The reliability of the data 
depends on the reliability of the remote storage. You need to ensure that the 
remote storage employs EC (Erasure Coding) or multi-replica technology to 
guarantee data reliability.
 :::
 
-Here is an example of creating an S3 resource:
+### Usage Guide
+
+Using S3 object storage as an example, start by creating an S3 RESOURCE:
 
 ```sql
 CREATE RESOURCE "remote_s3"
@@ -81,13 +51,25 @@ PROPERTIES
     "s3.connection.request.timeout" = "3000",
     "s3.connection.timeout" = "1000"
 );
+```
+
+:::tip
+When creating the S3 RESOURCE, a remote connection check will be performed to 
ensure the resource is created correctly.
+:::
 
+Next, create a STORAGE POLICY and associate it with the previously created 
RESOURCE:
+
+```sql
 CREATE STORAGE POLICY test_policy
 PROPERTIES(
     "storage_resource" = "remote_s3",
     "cooldown_ttl" = "1d"
 );
+```
+
+Finally, specify the STORAGE POLICY when creating a table:
 
+```sql
 CREATE TABLE IF NOT EXISTS create_table_use_created_policy 
 (
     k1 BIGINT,
@@ -102,11 +84,11 @@ PROPERTIES(
 );
 ```
 
-:::warning Notice
-If you set `"enable_unique_key_merge_on_write" = "true"` in UNIQUE table, you 
can't use this feature.
+:::warning
+If the UNIQUE table has `"enable_unique_key_merge_on_write" = "true"`, this 
feature cannot be used.
 :::
 
-And here is an example of creating an HDFS resource:
+Create an HDFS RESOURCE:
 
 ```sql
 CREATE RESOURCE "remote_hdfs" PROPERTIES (
@@ -124,11 +106,12 @@ CREATE RESOURCE "remote_hdfs" PROPERTIES (
 CREATE STORAGE POLICY test_policy PROPERTIES (
     "storage_resource" = "remote_hdfs",
     "cooldown_ttl" = "300"
-)
+);
 
 CREATE TABLE IF NOT EXISTS create_table_use_created_policy (
     k1 BIGINT,
-    k2 LARGEINTv1 VARCHAR(2048)
+    k2 LARGEINT,
+    v1 VARCHAR(2048)
 )
 UNIQUE KEY(k1)
 DISTRIBUTED BY HASH (k1) BUCKETS 3
@@ -138,111 +121,100 @@ PROPERTIES(
 );
 ```
 
-:::warning Notice
-If you set `"enable_unique_key_merge_on_write" = "true"` in UNIQUE table, you 
can't use this feature.
+:::warning
+If the UNIQUE table has `"enable_unique_key_merge_on_write" = "true"`, this 
feature cannot be used.
 :::
 
-Associate a storage policy with an existing table by using the following 
command:
+In addition to creating tables with remote storage, Doris also supports 
setting remote storage for existing tables or partitions.
+
+For an existing table, associate a remote storage policy by running:
 
 ```sql
-ALTER TABLE create_table_not_have_policy SET ("storage_policy" = 
"test_policy");
+ALTER TABLE create_table_not_have_policy set ("storage_policy" = 
"test_policy");
 ```
 
-Associate a storage policy with an existing partition by using the following 
command:
+For an existing PARTITION, associate a remote storage policy by running:
 
 ```sql
-ALTER TABLE create_table_partition MODIFY PARTITION (*) SET ("storage_policy" 
= "test_policy");
+ALTER TABLE create_table_partition MODIFY PARTITION (*) 
SET("storage_policy"="test_policy");
 ```
 
 :::tip
-If you specify different storage policies for the entire table and some 
partitions during table creation, the storage policy set for the partitions 
will be ignored, and all partitions of the table will use the table's storage 
policy. If you want a specific partition to have a different storage policy 
than the others, you can use the method mentioned above to modify the 
association for that specific partition.
-
-For more details, please refer to the following documents in the Docs 
directory: 
[RESOURCE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE),
 
[POLICY](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-POLICY),
 [CREATE 
TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE),
 [ALTER 
TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-COLUMN),
 which provide detailed explanations.
+Note that if you specify different storage policies for the entire table and 
certain partitions, the storage policy of the table will take precedence for 
all partitions. If you need a partition to use a different storage policy, you 
can modify it using the method above for existing partitions.
 :::
 
-### Limitations
+For more details, please refer to the documentation in the **Docs** directory, 
such as 
[RESOURCE](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE),
 
[POLICY](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-POLICY),
 [CREATE 
TABLE](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE),
 and [ALTER 
TABLE](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-COLUMN),
 which provide detai [...]
 
-- A single table or partition can only be associated with one storage policy. 
Once associated, the storage policy cannot be dropped without first removing 
the association between them.
+### Limitations
 
-- The object information associated with a storage policy does not support 
modifying the data storage path, such as bucket, endpoint, root_path, and other 
information.
+- A single table or partition can only be associated with one storage policy. 
Once associated, the storage policy cannot be dropped until the association is 
removed.
 
-- Storage policies support creation, modification, and deletion. Before 
deleting a storage policy, ensure that no tables are referencing the storage 
policy.
+- The storage path information associated with a storage policy (e.g., bucket, 
endpoint, root_path) cannot be modified after the policy is created.
 
-- When the Merge-on-Write feature is enabled, the Unique model does not 
support setting a storage policy.
+- Storage policies support creation, modification, and deletion. However, 
before deleting a policy, you need to ensure that no tables are referencing 
this storage policy.
 
+- The Unique model with Merge-on-write enabled may face restrictions... 
 
-## Occupied Size of Cold Data Objects
+## Viewing Remote Storage Usage
 
-Method 1: You can use the `show proc '/backends'` command to view the size of 
each backend's uploaded objects. Look for the `RemoteUsedCapacity` field. 
Please note that this method may have some latency.
+Method 1: You can view the size uploaded to the object storage by each BE by 
using `show proc '/backends'`, specifically the `RemoteUsedCapacity` item. Note 
that this method may have some delay.
 
-Method 2: You can use the `show tablets from tableName` command to view the 
size of each tablet in a table, indicated by the `RemoteDataSize` field.
+Method 2: You can view the object size used by each tablet of a table by using 
`show tablets from tableName`, specifically the `RemoteDataSize` item.
 
-## Cache for Cold Data
+## Remote Storage Cache
 
-As mentioned earlier, caching is introduced for cold data to optimize query 
performance and save object storage resources. When cold data is first accessed 
after cooling, Doris reloads the cooled data onto the local disk of the backend 
(BE). The cold data cache has the following characteristics:
+To optimize query performance and save object storage resources, the concept 
of cache is introduced. When querying data from remote storage for the first 
time, Doris will load the data from remote storage to the BE's local disk as a 
cache. The cache has the following characteristics:
 
 - The cache is stored on the BE's disk and does not occupy memory space.
+- The cache can be limited in size, with data cleanup performed using an LRU 
(Least Recently Used) policy.
+- The implementation of the cache is the same as the federated query catalog 
cache. For more information, refer to the 
[documentation](../../lakehouse/filecache).
 
-- The cache can be limited in size and uses LRU (Least Recently Used) for data 
eviction.
-
-- The implementation of the cache for cold data is the same as the cache for 
federated query catalog. Please refer to the documentation at 
[Filecache](../lakehouse/filecache) for more details.
-
-## Compaction of Cold Data
-
-The time at which cold data enters is counted from the moment the data rowset 
file is written to the local disk, plus the cooling duration. Since data is not 
written and cooled all at once, Doris performs compaction on cold data to avoid 
the issue of small files within object storage. However, the frequency and 
resource prioritization of cold data compaction are not very high. It is 
recommended to perform compaction on local hot data before cooling. You can 
adjust the following BE parameters:
-
-- The BE parameter `cold_data_compaction_thread_num` sets the concurrency for 
cold data compaction. The default value is 2.
+## Remote Storage Compaction
 
-- The BE parameter `cold_data_compaction_interval_sec` sets the time interval 
for cold data compaction. The default value is 1800 seconds (30 minutes).
+The data in remote storage is considered to be "ingested" at the moment the 
rowset file is written to the local disk, plus the cooldown time. Since data is 
not written and cooled all at once, to avoid the small file problem in object 
storage, Doris will perform compaction on remote storage data. However, the 
frequency and priority of remote storage compaction are not very high. It is 
recommended to perform compaction on local hot data before executing cooldown. 
The following BE parameter [...]
 
-## Schema Change for Cold Data
+- The BE parameter `cold_data_compaction_thread_num` sets the concurrency for 
performing compaction on remote storage. The default value is 2.
+- The BE parameter `cold_data_compaction_interval_sec` sets the time interval 
for executing remote storage compaction. The default value is 1800 seconds (30 
minutes).
 
-The following schema change types are supported for cold data:
+## Remote Storage Schema Change
 
-- Adding or deleting columns
+Remote storage schema changes are supported. These include:
 
+- Adding or removing columns
 - Modifying column types
-
 - Adjusting column order
-
 - Adding or modifying indexes
 
-## Garbage Collection of Cold Data
-
-Garbage data for cold data refers to data that is not used by any replica. The 
following situations may generate garbage data on object storage:
-
-1. Partial segment upload succeeds while the upload of the rowset fails.
+## Remote Storage Garbage Collection
 
-2. After the FE reselects the CooldownReplica, the rowset versions of the old 
and new CooldownReplica do not match. FollowerReplicas synchronize the 
CooldownMeta of the new CooldownReplica, and the rowsets with inconsistent 
versions in the old CooldownReplica become garbage data.
+Remote storage garbage data refers to data that is not being used by any 
replica. Garbage data may occur on object storage in the following cases:
 
-3. After cold data compaction, the rowsets before merging cannot be 
immediately deleted because they may still be used by other replicas. However, 
eventually, all FollowerReplicas use the latest merged rowset, and the rowsets 
before merging become garbage data.
+1. Rowsets upload fails but some segments are successfully uploaded.
+2. The FE re-selects a CooldownReplica, causing an inconsistency between the 
rowset versions of the old and new CooldownReplica. FollowerReplicas 
synchronize the CooldownMeta of the new CooldownReplica, and the rowsets with 
version mismatches in the old CooldownReplica become garbage data.
+3. After a remote storage compaction, the rowsets before merging cannot be 
immediately deleted because they may still be used by other replicas. 
Eventually, once all FollowerReplicas use the latest merged rowset, the 
pre-merge rowsets become garbage data.
 
-Furthermore, the garbage data on objects is not immediately cleaned up. The BE 
parameter `remove_unused_remote_files_interval_sec` sets the time interval for 
garbage collection of cold data. The default value is 21600 seconds (6 hours).
+Additionally, garbage data on objects will not be cleaned up immediately. The 
BE parameter `remove_unused_remote_files_interval_sec` sets the time interval 
for remote storage garbage collection, with a default value of 21600 seconds (6 
hours).
 
-## TODOs
-
-- Some remote occupancy metrics may not have comprehensive update retrieval.
-
-## FAQs
+## Common Issues
 
 1. `ERROR 1105 (HY000): errCode = 2, detailMessage = Failed to create 
repository: connect to s3 failed: Unable to marshall request to JSON: host must 
not be null.`
 
-The S3 SDK defaults to using the virtual-hosted style. However, some object 
storage systems (e.g., MinIO) may not have virtual-hosted style access enabled 
or supported. In such cases, you can add the `use_path_style` parameter to 
force the use of path-style access:
-
-```sql
-CREATE RESOURCE "remote_s3"
-PROPERTIES
-(
-    "type" = "s3",
-    "s3.endpoint" = "bj.s3.com",
-    "s3.region" = "bj",
-    "s3.bucket" = "test-bucket",
-    "s3.root.path" = "path/to/root",
-    "s3.access_key" = "bbb",
-    "s3.secret_key" = "aaaa",
-    "s3.connection.maximum" = "50",
-    "s3.connection.request.timeout" = "3000",
-    "s3.connection.timeout" = "1000",
-    "use_path_style" = "true"
-);
-```
+   The S3 SDK uses the virtual-hosted style access method by default. However, 
some object storage systems (such as MinIO) may not have virtual-hosted style 
access enabled or supported. In this case, you can add the `use_path_style` 
parameter to force path-style access:
+
+   ```sql
+   CREATE RESOURCE "remote_s3"
+   PROPERTIES
+   (
+       "type" = "s3",
+       "s3.endpoint" = "bj.s3.com",
+       "s3.region" = "bj",
+       "s3.bucket" = "test-bucket",
+       "s3.root.path" = "path/to/root",
+       "s3.access_key" = "bbb",
+       "s3.secret_key" = "aaaa",
+       "s3.connection.maximum" = "50",
+       "s3.connection.request.timeout" = "3000",
+       "s3.connection.timeout" = "1000",
+       "use_path_style" = "true"
+   );
+   ```
\ No newline at end of file
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/tiered-storage/remote-storage.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/tiered-storage/remote-storage.md
index ebe360c8aa..4c3a63bb9a 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/tiered-storage/remote-storage.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/tiered-storage/remote-storage.md
@@ -23,46 +23,18 @@ KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
-## 需求场景
 
-未来一个很大的使用场景是类似于 ES 
日志存储，日志场景下数据会按照日期来切割数据，很多数据是冷数据，查询很少，需要降低这类数据的存储成本。从节约存储成本角度考虑：
+## 功能简介
 
--   各云厂商普通云盘的价格都比对象存储贵
+远程存储支持把部分数据放到外部存储（例如对象存储，HDFS）上，节省成本，不牺牲功能。
 
--   在 Doris 集群实际线上使用中，普通云盘的利用率无法达到 100%
-
--   云盘不是按需付费，而对象存储可以做到按需付费
-
--   基于普通云盘做高可用，需要实现多副本，某副本异常要做副本迁移。而将数据放到对象存储上则不存在此类问题，因为对象存储是共享的。
-
-## 解决方案
-
-在 Partition 级别上设置 Freeze time，表示多久这个 Partition 会被 Freeze，并且定义 Freeze 之后存储的 
Remote storage 的位置。在 BE 上 daemon 线程会周期性的判断表是否需要 freeze，若 freeze 后会将数据上传到兼容 S3 
协议的对象存储和 HDFS 上。
-
-冷热分层支持所有 Doris 功能，只是把部分数据放到对象存储上，以节省成本，不牺牲功能。因此有如下特点：
-
--   冷数据放到对象存储上，用户无需担心数据一致性和数据安全性问题
--   灵活的 Freeze 策略，冷却远程存储 Property 可以应用到表和 Partition 级别
-
--   用户查询数据，无需关注数据分布位置，若数据不在本地，会拉取对象上的数据，并 cache 到 BE 本地
-
--   副本 clone 优化，若存储数据在对象上，则副本 clone 的时候不用去拉取存储数据到本地
-
--   远程对象空间回收 recycler，若表、分区被删除，或者冷热分层过程中异常情况产生的空间浪费，则会有 recycler 
线程周期性的回收，节约存储资源
-
--   cache 优化，将访问过的冷数据 cache 到 BE 本地，达到非冷热分层的查询性能
-
--   BE 线程池优化，区分数据来源是本地还是对象存储，防止读取对象延时影响查询性能
-
-## Storage policy 的使用
-
-存储策略是使用冷热分层功能的入口，用户只需要在建表或使用 Doris 过程中，给表或分区关联上 Storage policy，即可以使用冷热分层的功能。
-
-:::tip
-创建 S3 RESOURCE 的时候，会进行 S3 远端的链接校验，以保证 RESOURCE 创建的正确。
+:::warning 注意
+远程存储的数据只有一个副本，数据可靠性依赖远程存储的数据可靠性，您需要保证远程存储有ec（擦除码）或者多副本技术确保数据可靠性。
 :::
 
-下面演示如何创建 S3 RESOURCE：
+## 使用方法
+
+以S3对象存储为例，首先创建S3 RESOURCE：
 
 ```sql
 CREATE RESOURCE "remote_s3"
@@ -79,13 +51,25 @@ PROPERTIES
     "s3.connection.request.timeout" = "3000",
     "s3.connection.timeout" = "1000"
 );
+```
 
+:::tip
+创建 S3 RESOURCE 的时候，会进行 S3 远端的链接校验，以保证 RESOURCE 创建的正确。
+:::
+
+之后创建STORAGE POLICY，关联上文创建的RESOURCE：
+
+```sql
 CREATE STORAGE POLICY test_policy
 PROPERTIES(
     "storage_resource" = "remote_s3",
     "cooldown_ttl" = "1d"
 );
+```
 
+最后建表的时候指定STORAGE POLICY：
+
+```sql
 CREATE TABLE IF NOT EXISTS create_table_use_created_policy 
 (
     k1 BIGINT,
@@ -104,7 +88,7 @@ PROPERTIES(
 UNIQUE 表如果设置了 `"enable_unique_key_merge_on_write" = "true"` 的话，无法使用此功能。
 :::
 
-以及如何创建 HDFS RESOURCE：
+创建 HDFS RESOURCE：
 
 ```sql
 CREATE RESOURCE "remote_hdfs" PROPERTIES (
@@ -125,9 +109,9 @@ CREATE RESOURCE "remote_hdfs" PROPERTIES (
     )
 
     CREATE TABLE IF NOT EXISTS create_table_use_created_policy (
-    k1 BIGINT,
-    k2 LARGEINT,
-    v1 VARCHAR(2048)
+        k1 BIGINT,
+        k2 LARGEINT,
+        v1 VARCHAR(2048)
     )
     UNIQUE KEY(k1)
     DISTRIBUTED BY HASH (k1) BUCKETS 3
@@ -141,13 +125,15 @@ CREATE RESOURCE "remote_hdfs" PROPERTIES (
 UNIQUE 表如果设置了 `"enable_unique_key_merge_on_write" = "true"` 的话，无法使用此功能。
 :::
 
-或者对一个已存在的表，关联 Storage policy
+除了新建表支持设置远程存储外，Doris还支持对一个已存在的表或者PARTITION，设置远程存储。
+
+对一个已存在的表，设置远程存储，将创建好的STORAGE POLICY与表关联：
 
 ```sql
 ALTER TABLE create_table_not_have_policy set ("storage_policy" = 
"test_policy");
 ```
 
-或者对一个已存在的 partition，关联 Storage policy
+对一个已存在的PARTITION，设置远程存储，将创建好的STORAGE POLICY与PARTITON关联：
 
 ```sql
 ALTER TABLE create_table_partition MODIFY PARTITION (*) 
SET("storage_policy"="test_policy");
@@ -156,7 +142,7 @@ ALTER TABLE create_table_partition MODIFY PARTITION (*) 
SET("storage_policy"="te
 :::tip
 注意，如果用户在建表时给整张 Table 和部分 Partition 指定了不同的 Storage Policy，Partition 设置的 Storage 
policy 会被无视，整张表的所有 Partition 都会使用 table 的 Policy. 如果您需要让某个 Partition 的 Policy 
和别的不同，则可以使用上文中对一个已存在的 Partition，关联 Storage policy 的方式修改。
 
-具体可以参考 Docs 
目录下[RESOURCE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE)、
 
[POLICY](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-POLICY)、
 [CREATE 
TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE)、
 [ALTER 
TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-COLUMN)等文档，里面有详细介绍。
+具体可以参考 Docs 
目录下[RESOURCE](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE)、
 
[POLICY](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-POLICY)、
 [CREATE 
TABLE](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE)、
 [ALTER 
TABLE](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-COLUMN)等文档，里面有详细介绍。
 :::
 
 ### 一些限制
@@ -169,33 +155,33 @@ ALTER TABLE create_table_partition MODIFY PARTITION (*) 
SET("storage_policy"="te
 
 -   Unique 模型在开启 Merge-on-Write 特性时，不支持设置 Storage policy。
 
-## 冷数据占用对象大小
+## 查看远程存储占用大小
 
 方式一：通过 show proc '/backends'可以查看到每个 BE 上传到对象的大小，RemoteUsedCapacity 项，此方式略有延迟。
 
 方式二：通过 show tablets from tableName 可以查看到表的每个 tablet 占用的对象大小，RemoteDataSize 项。
 
-## 冷数据的 cache
+## 远程存储的 cache
 
-上文提到冷数据为了优化查询的性能和对象存储资源节省，引入了 cache 的概念。在冷却后首次命中，Doris 会将已经冷却的数据又重新加载到 BE 
的本地磁盘，cache 有以下特性：
+为了优化查询的性能和对象存储资源节省，引入了 cache 的概念。在第一次查询远程存储的数据时，Doris 会将远程存储的数据加载到 BE 
的本地磁盘做缓存，cache 有以下特性：
 
 -   cache 实际存储于 BE 磁盘，不占用内存空间。
 
 -   cache 可以限制膨胀，通过 LRU 进行数据的清理
 
--   cache 的实现和联邦查询 Catalog 的 cache 是同一套实现，文档参考[此处](../lakehouse/filecache)
+-   cache 的实现和联邦查询 Catalog 的 cache 是同一套实现，文档参考[此处](../../lakehouse/filecache)
 
-## 冷数据的 Compaction
+## 远程存储的 Compaction
 
-冷数据传入的时间是数据 rowset 文件写入本地磁盘时刻起，加上冷却时间。由于数据并不是一次性写入和冷却的，因此避免在对象存储内的小文件问题，Doris 
也会进行冷数据的 Compaction。但是，冷数据的 Compaction 的频次和资源占用的优先级并不是很高，也推荐本地热数据 compaction 
后再执行冷却。具体可以通过以下 BE 参数调整：
+远程存储数据传入的时间是 rowset 文件写入本地磁盘时刻起，加上冷却时间。由于数据并不是一次性写入和冷却的，因此避免在对象存储内的小文件问题，Doris 
也会进行远程存储数据的 Compaction。但是，远程存储数据的 Compaction 的频次和资源占用的优先级并不是很高，也推荐本地热数据 
compaction 后再执行冷却。具体可以通过以下 BE 参数调整：
 
--   BE 参数`cold_data_compaction_thread_num`可以设置执行冷数据的 Compaction 的并发，默认是 2。
+-   BE 参数`cold_data_compaction_thread_num`可以设置执行远程存储的 Compaction 的并发，默认是 2。
 
--   BE 参数`cold_data_compaction_interval_sec`可以设置执行冷数据的 Compaction 的时间间隔，默认是 
1800，单位：秒，即半个小时。。
+-   BE 参数`cold_data_compaction_interval_sec`可以设置执行远程存储的 Compaction 的时间间隔，默认是 
1800，单位：秒，即半个小时。。
 
-## 冷数据的 Schema Change
+## 远程存储的 Schema Change
 
-数据冷却后支持 Schema Change 类型如下：
+远程存储支持 Schema Change 类型如下：
 
 -   增加、删除列
 
@@ -205,21 +191,17 @@ ALTER TABLE create_table_partition MODIFY PARTITION (*) 
SET("storage_policy"="te
 
 -   增加、修改索引
 
-## 冷数据的垃圾回收
+## 远程存储的垃圾回收
 
-冷数据的垃圾数据是指没有被任何 Replica 使用的数据，对象存储上可能会有如下情况产生的垃圾数据：
+远程存储的垃圾数据是指没有被任何 Replica 使用的数据，对象存储上可能会有如下情况产生的垃圾数据：
 
 1.  上传 rowset 失败但是有部分 segment 上传成功。
 
 2.  FE 重新选 CooldownReplica 后，新旧 CooldownReplica 的 rowset version 
不一致，FollowerReplica 都去同步新 CooldownReplica 的 CooldownMeta，旧 CooldownReplica 中 
version 不一致的 rowset 没有 Replica 使用成为垃圾数据。
 
-3.  冷数据 Compaction 后，合并前的 rowset 因为还可能被其他 Replica 使用不能立即删除，但是最终 
FollowerReplica 都使用了最新的合并后的 rowset，合并前的 rowset 成为垃圾数据。
-
-另外，对象上的垃圾数据并不会立即清理掉。BE 
参数`remove_unused_remote_files_interval_sec`可以设置冷数据的垃圾回收的时间间隔，默认是 21600，单位：秒，即 6 
个小时。
-
-## 未尽事项
+3.  远程存储数据 Compaction 后，合并前的 rowset 因为还可能被其他 Replica 使用不能立即删除，但是最终 
FollowerReplica 都使用了最新的合并后的 rowset，合并前的 rowset 成为垃圾数据。
 
--   一些远端占用指标更新获取不够完善
+另外，对象上的垃圾数据并不会立即清理掉。BE 
参数`remove_unused_remote_files_interval_sec`可以设置远程存储的垃圾回收的时间间隔，默认是 21600，单位：秒，即 
6 个小时。
 
 ## 常见问题
 
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/tiered-storage/remote-storage.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/tiered-storage/remote-storage.md
index ebe360c8aa..4c3a63bb9a 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/tiered-storage/remote-storage.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/tiered-storage/remote-storage.md
@@ -23,46 +23,18 @@ KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
-## 需求场景
 
-未来一个很大的使用场景是类似于 ES 
日志存储，日志场景下数据会按照日期来切割数据，很多数据是冷数据，查询很少，需要降低这类数据的存储成本。从节约存储成本角度考虑：
+## 功能简介
 
--   各云厂商普通云盘的价格都比对象存储贵
+远程存储支持把部分数据放到外部存储（例如对象存储，HDFS）上，节省成本，不牺牲功能。
 
--   在 Doris 集群实际线上使用中，普通云盘的利用率无法达到 100%
-
--   云盘不是按需付费，而对象存储可以做到按需付费
-
--   基于普通云盘做高可用，需要实现多副本，某副本异常要做副本迁移。而将数据放到对象存储上则不存在此类问题，因为对象存储是共享的。
-
-## 解决方案
-
-在 Partition 级别上设置 Freeze time，表示多久这个 Partition 会被 Freeze，并且定义 Freeze 之后存储的 
Remote storage 的位置。在 BE 上 daemon 线程会周期性的判断表是否需要 freeze，若 freeze 后会将数据上传到兼容 S3 
协议的对象存储和 HDFS 上。
-
-冷热分层支持所有 Doris 功能，只是把部分数据放到对象存储上，以节省成本，不牺牲功能。因此有如下特点：
-
--   冷数据放到对象存储上，用户无需担心数据一致性和数据安全性问题
--   灵活的 Freeze 策略，冷却远程存储 Property 可以应用到表和 Partition 级别
-
--   用户查询数据，无需关注数据分布位置，若数据不在本地，会拉取对象上的数据，并 cache 到 BE 本地
-
--   副本 clone 优化，若存储数据在对象上，则副本 clone 的时候不用去拉取存储数据到本地
-
--   远程对象空间回收 recycler，若表、分区被删除，或者冷热分层过程中异常情况产生的空间浪费，则会有 recycler 
线程周期性的回收，节约存储资源
-
--   cache 优化，将访问过的冷数据 cache 到 BE 本地，达到非冷热分层的查询性能
-
--   BE 线程池优化，区分数据来源是本地还是对象存储，防止读取对象延时影响查询性能
-
-## Storage policy 的使用
-
-存储策略是使用冷热分层功能的入口，用户只需要在建表或使用 Doris 过程中，给表或分区关联上 Storage policy，即可以使用冷热分层的功能。
-
-:::tip
-创建 S3 RESOURCE 的时候，会进行 S3 远端的链接校验，以保证 RESOURCE 创建的正确。
+:::warning 注意
+远程存储的数据只有一个副本，数据可靠性依赖远程存储的数据可靠性，您需要保证远程存储有ec（擦除码）或者多副本技术确保数据可靠性。
 :::
 
-下面演示如何创建 S3 RESOURCE：
+## 使用方法
+
+以S3对象存储为例，首先创建S3 RESOURCE：
 
 ```sql
 CREATE RESOURCE "remote_s3"
@@ -79,13 +51,25 @@ PROPERTIES
     "s3.connection.request.timeout" = "3000",
     "s3.connection.timeout" = "1000"
 );
+```
 
+:::tip
+创建 S3 RESOURCE 的时候，会进行 S3 远端的链接校验，以保证 RESOURCE 创建的正确。
+:::
+
+之后创建STORAGE POLICY，关联上文创建的RESOURCE：
+
+```sql
 CREATE STORAGE POLICY test_policy
 PROPERTIES(
     "storage_resource" = "remote_s3",
     "cooldown_ttl" = "1d"
 );
+```
 
+最后建表的时候指定STORAGE POLICY：
+
+```sql
 CREATE TABLE IF NOT EXISTS create_table_use_created_policy 
 (
     k1 BIGINT,
@@ -104,7 +88,7 @@ PROPERTIES(
 UNIQUE 表如果设置了 `"enable_unique_key_merge_on_write" = "true"` 的话，无法使用此功能。
 :::
 
-以及如何创建 HDFS RESOURCE：
+创建 HDFS RESOURCE：
 
 ```sql
 CREATE RESOURCE "remote_hdfs" PROPERTIES (
@@ -125,9 +109,9 @@ CREATE RESOURCE "remote_hdfs" PROPERTIES (
     )
 
     CREATE TABLE IF NOT EXISTS create_table_use_created_policy (
-    k1 BIGINT,
-    k2 LARGEINT,
-    v1 VARCHAR(2048)
+        k1 BIGINT,
+        k2 LARGEINT,
+        v1 VARCHAR(2048)
     )
     UNIQUE KEY(k1)
     DISTRIBUTED BY HASH (k1) BUCKETS 3
@@ -141,13 +125,15 @@ CREATE RESOURCE "remote_hdfs" PROPERTIES (
 UNIQUE 表如果设置了 `"enable_unique_key_merge_on_write" = "true"` 的话，无法使用此功能。
 :::
 
-或者对一个已存在的表，关联 Storage policy
+除了新建表支持设置远程存储外，Doris还支持对一个已存在的表或者PARTITION，设置远程存储。
+
+对一个已存在的表，设置远程存储，将创建好的STORAGE POLICY与表关联：
 
 ```sql
 ALTER TABLE create_table_not_have_policy set ("storage_policy" = 
"test_policy");
 ```
 
-或者对一个已存在的 partition，关联 Storage policy
+对一个已存在的PARTITION，设置远程存储，将创建好的STORAGE POLICY与PARTITON关联：
 
 ```sql
 ALTER TABLE create_table_partition MODIFY PARTITION (*) 
SET("storage_policy"="test_policy");
@@ -156,7 +142,7 @@ ALTER TABLE create_table_partition MODIFY PARTITION (*) 
SET("storage_policy"="te
 :::tip
 注意，如果用户在建表时给整张 Table 和部分 Partition 指定了不同的 Storage Policy，Partition 设置的 Storage 
policy 会被无视，整张表的所有 Partition 都会使用 table 的 Policy. 如果您需要让某个 Partition 的 Policy 
和别的不同，则可以使用上文中对一个已存在的 Partition，关联 Storage policy 的方式修改。
 
-具体可以参考 Docs 
目录下[RESOURCE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE)、
 
[POLICY](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-POLICY)、
 [CREATE 
TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE)、
 [ALTER 
TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-COLUMN)等文档，里面有详细介绍。
+具体可以参考 Docs 
目录下[RESOURCE](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE)、
 
[POLICY](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-POLICY)、
 [CREATE 
TABLE](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE)、
 [ALTER 
TABLE](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-COLUMN)等文档，里面有详细介绍。
 :::
 
 ### 一些限制
@@ -169,33 +155,33 @@ ALTER TABLE create_table_partition MODIFY PARTITION (*) 
SET("storage_policy"="te
 
 -   Unique 模型在开启 Merge-on-Write 特性时，不支持设置 Storage policy。
 
-## 冷数据占用对象大小
+## 查看远程存储占用大小
 
 方式一：通过 show proc '/backends'可以查看到每个 BE 上传到对象的大小，RemoteUsedCapacity 项，此方式略有延迟。
 
 方式二：通过 show tablets from tableName 可以查看到表的每个 tablet 占用的对象大小，RemoteDataSize 项。
 
-## 冷数据的 cache
+## 远程存储的 cache
 
-上文提到冷数据为了优化查询的性能和对象存储资源节省，引入了 cache 的概念。在冷却后首次命中，Doris 会将已经冷却的数据又重新加载到 BE 
的本地磁盘，cache 有以下特性：
+为了优化查询的性能和对象存储资源节省，引入了 cache 的概念。在第一次查询远程存储的数据时，Doris 会将远程存储的数据加载到 BE 
的本地磁盘做缓存，cache 有以下特性：
 
 -   cache 实际存储于 BE 磁盘，不占用内存空间。
 
 -   cache 可以限制膨胀，通过 LRU 进行数据的清理
 
--   cache 的实现和联邦查询 Catalog 的 cache 是同一套实现，文档参考[此处](../lakehouse/filecache)
+-   cache 的实现和联邦查询 Catalog 的 cache 是同一套实现，文档参考[此处](../../lakehouse/filecache)
 
-## 冷数据的 Compaction
+## 远程存储的 Compaction
 
-冷数据传入的时间是数据 rowset 文件写入本地磁盘时刻起，加上冷却时间。由于数据并不是一次性写入和冷却的，因此避免在对象存储内的小文件问题，Doris 
也会进行冷数据的 Compaction。但是，冷数据的 Compaction 的频次和资源占用的优先级并不是很高，也推荐本地热数据 compaction 
后再执行冷却。具体可以通过以下 BE 参数调整：
+远程存储数据传入的时间是 rowset 文件写入本地磁盘时刻起，加上冷却时间。由于数据并不是一次性写入和冷却的，因此避免在对象存储内的小文件问题，Doris 
也会进行远程存储数据的 Compaction。但是，远程存储数据的 Compaction 的频次和资源占用的优先级并不是很高，也推荐本地热数据 
compaction 后再执行冷却。具体可以通过以下 BE 参数调整：
 
--   BE 参数`cold_data_compaction_thread_num`可以设置执行冷数据的 Compaction 的并发，默认是 2。
+-   BE 参数`cold_data_compaction_thread_num`可以设置执行远程存储的 Compaction 的并发，默认是 2。
 
--   BE 参数`cold_data_compaction_interval_sec`可以设置执行冷数据的 Compaction 的时间间隔，默认是 
1800，单位：秒，即半个小时。。
+-   BE 参数`cold_data_compaction_interval_sec`可以设置执行远程存储的 Compaction 的时间间隔，默认是 
1800，单位：秒，即半个小时。。
 
-## 冷数据的 Schema Change
+## 远程存储的 Schema Change
 
-数据冷却后支持 Schema Change 类型如下：
+远程存储支持 Schema Change 类型如下：
 
 -   增加、删除列
 
@@ -205,21 +191,17 @@ ALTER TABLE create_table_partition MODIFY PARTITION (*) 
SET("storage_policy"="te
 
 -   增加、修改索引
 
-## 冷数据的垃圾回收
+## 远程存储的垃圾回收
 
-冷数据的垃圾数据是指没有被任何 Replica 使用的数据，对象存储上可能会有如下情况产生的垃圾数据：
+远程存储的垃圾数据是指没有被任何 Replica 使用的数据，对象存储上可能会有如下情况产生的垃圾数据：
 
 1.  上传 rowset 失败但是有部分 segment 上传成功。
 
 2.  FE 重新选 CooldownReplica 后，新旧 CooldownReplica 的 rowset version 
不一致，FollowerReplica 都去同步新 CooldownReplica 的 CooldownMeta，旧 CooldownReplica 中 
version 不一致的 rowset 没有 Replica 使用成为垃圾数据。
 
-3.  冷数据 Compaction 后，合并前的 rowset 因为还可能被其他 Replica 使用不能立即删除，但是最终 
FollowerReplica 都使用了最新的合并后的 rowset，合并前的 rowset 成为垃圾数据。
-
-另外，对象上的垃圾数据并不会立即清理掉。BE 
参数`remove_unused_remote_files_interval_sec`可以设置冷数据的垃圾回收的时间间隔，默认是 21600，单位：秒，即 6 
个小时。
-
-## 未尽事项
+3.  远程存储数据 Compaction 后，合并前的 rowset 因为还可能被其他 Replica 使用不能立即删除，但是最终 
FollowerReplica 都使用了最新的合并后的 rowset，合并前的 rowset 成为垃圾数据。
 
--   一些远端占用指标更新获取不够完善
+另外，对象上的垃圾数据并不会立即清理掉。BE 
参数`remove_unused_remote_files_interval_sec`可以设置远程存储的垃圾回收的时间间隔，默认是 21600，单位：秒，即 
6 个小时。
 
 ## 常见问题
 
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/tiered-storage/remote-storage.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/tiered-storage/remote-storage.md
index ebe360c8aa..4c3a63bb9a 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/tiered-storage/remote-storage.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/tiered-storage/remote-storage.md
@@ -23,46 +23,18 @@ KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
-## 需求场景
 
-未来一个很大的使用场景是类似于 ES 
日志存储，日志场景下数据会按照日期来切割数据，很多数据是冷数据，查询很少，需要降低这类数据的存储成本。从节约存储成本角度考虑：
+## 功能简介
 
--   各云厂商普通云盘的价格都比对象存储贵
+远程存储支持把部分数据放到外部存储（例如对象存储，HDFS）上，节省成本，不牺牲功能。
 
--   在 Doris 集群实际线上使用中，普通云盘的利用率无法达到 100%
-
--   云盘不是按需付费，而对象存储可以做到按需付费
-
--   基于普通云盘做高可用，需要实现多副本，某副本异常要做副本迁移。而将数据放到对象存储上则不存在此类问题，因为对象存储是共享的。
-
-## 解决方案
-
-在 Partition 级别上设置 Freeze time，表示多久这个 Partition 会被 Freeze，并且定义 Freeze 之后存储的 
Remote storage 的位置。在 BE 上 daemon 线程会周期性的判断表是否需要 freeze，若 freeze 后会将数据上传到兼容 S3 
协议的对象存储和 HDFS 上。
-
-冷热分层支持所有 Doris 功能，只是把部分数据放到对象存储上，以节省成本，不牺牲功能。因此有如下特点：
-
--   冷数据放到对象存储上，用户无需担心数据一致性和数据安全性问题
--   灵活的 Freeze 策略，冷却远程存储 Property 可以应用到表和 Partition 级别
-
--   用户查询数据，无需关注数据分布位置，若数据不在本地，会拉取对象上的数据，并 cache 到 BE 本地
-
--   副本 clone 优化，若存储数据在对象上，则副本 clone 的时候不用去拉取存储数据到本地
-
--   远程对象空间回收 recycler，若表、分区被删除，或者冷热分层过程中异常情况产生的空间浪费，则会有 recycler 
线程周期性的回收，节约存储资源
-
--   cache 优化，将访问过的冷数据 cache 到 BE 本地，达到非冷热分层的查询性能
-
--   BE 线程池优化，区分数据来源是本地还是对象存储，防止读取对象延时影响查询性能
-
-## Storage policy 的使用
-
-存储策略是使用冷热分层功能的入口，用户只需要在建表或使用 Doris 过程中，给表或分区关联上 Storage policy，即可以使用冷热分层的功能。
-
-:::tip
-创建 S3 RESOURCE 的时候，会进行 S3 远端的链接校验，以保证 RESOURCE 创建的正确。
+:::warning 注意
+远程存储的数据只有一个副本，数据可靠性依赖远程存储的数据可靠性，您需要保证远程存储有ec（擦除码）或者多副本技术确保数据可靠性。
 :::
 
-下面演示如何创建 S3 RESOURCE：
+## 使用方法
+
+以S3对象存储为例，首先创建S3 RESOURCE：
 
 ```sql
 CREATE RESOURCE "remote_s3"
@@ -79,13 +51,25 @@ PROPERTIES
     "s3.connection.request.timeout" = "3000",
     "s3.connection.timeout" = "1000"
 );
+```
 
+:::tip
+创建 S3 RESOURCE 的时候，会进行 S3 远端的链接校验，以保证 RESOURCE 创建的正确。
+:::
+
+之后创建STORAGE POLICY，关联上文创建的RESOURCE：
+
+```sql
 CREATE STORAGE POLICY test_policy
 PROPERTIES(
     "storage_resource" = "remote_s3",
     "cooldown_ttl" = "1d"
 );
+```
 
+最后建表的时候指定STORAGE POLICY：
+
+```sql
 CREATE TABLE IF NOT EXISTS create_table_use_created_policy 
 (
     k1 BIGINT,
@@ -104,7 +88,7 @@ PROPERTIES(
 UNIQUE 表如果设置了 `"enable_unique_key_merge_on_write" = "true"` 的话，无法使用此功能。
 :::
 
-以及如何创建 HDFS RESOURCE：
+创建 HDFS RESOURCE：
 
 ```sql
 CREATE RESOURCE "remote_hdfs" PROPERTIES (
@@ -125,9 +109,9 @@ CREATE RESOURCE "remote_hdfs" PROPERTIES (
     )
 
     CREATE TABLE IF NOT EXISTS create_table_use_created_policy (
-    k1 BIGINT,
-    k2 LARGEINT,
-    v1 VARCHAR(2048)
+        k1 BIGINT,
+        k2 LARGEINT,
+        v1 VARCHAR(2048)
     )
     UNIQUE KEY(k1)
     DISTRIBUTED BY HASH (k1) BUCKETS 3
@@ -141,13 +125,15 @@ CREATE RESOURCE "remote_hdfs" PROPERTIES (
 UNIQUE 表如果设置了 `"enable_unique_key_merge_on_write" = "true"` 的话，无法使用此功能。
 :::
 
-或者对一个已存在的表，关联 Storage policy
+除了新建表支持设置远程存储外，Doris还支持对一个已存在的表或者PARTITION，设置远程存储。
+
+对一个已存在的表，设置远程存储，将创建好的STORAGE POLICY与表关联：
 
 ```sql
 ALTER TABLE create_table_not_have_policy set ("storage_policy" = 
"test_policy");
 ```
 
-或者对一个已存在的 partition，关联 Storage policy
+对一个已存在的PARTITION，设置远程存储，将创建好的STORAGE POLICY与PARTITON关联：
 
 ```sql
 ALTER TABLE create_table_partition MODIFY PARTITION (*) 
SET("storage_policy"="test_policy");
@@ -156,7 +142,7 @@ ALTER TABLE create_table_partition MODIFY PARTITION (*) 
SET("storage_policy"="te
 :::tip
 注意，如果用户在建表时给整张 Table 和部分 Partition 指定了不同的 Storage Policy，Partition 设置的 Storage 
policy 会被无视，整张表的所有 Partition 都会使用 table 的 Policy. 如果您需要让某个 Partition 的 Policy 
和别的不同，则可以使用上文中对一个已存在的 Partition，关联 Storage policy 的方式修改。
 
-具体可以参考 Docs 
目录下[RESOURCE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE)、
 
[POLICY](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-POLICY)、
 [CREATE 
TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE)、
 [ALTER 
TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-COLUMN)等文档，里面有详细介绍。
+具体可以参考 Docs 
目录下[RESOURCE](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE)、
 
[POLICY](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-POLICY)、
 [CREATE 
TABLE](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE)、
 [ALTER 
TABLE](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-COLUMN)等文档，里面有详细介绍。
 :::
 
 ### 一些限制
@@ -169,33 +155,33 @@ ALTER TABLE create_table_partition MODIFY PARTITION (*) 
SET("storage_policy"="te
 
 -   Unique 模型在开启 Merge-on-Write 特性时，不支持设置 Storage policy。
 
-## 冷数据占用对象大小
+## 查看远程存储占用大小
 
 方式一：通过 show proc '/backends'可以查看到每个 BE 上传到对象的大小，RemoteUsedCapacity 项，此方式略有延迟。
 
 方式二：通过 show tablets from tableName 可以查看到表的每个 tablet 占用的对象大小，RemoteDataSize 项。
 
-## 冷数据的 cache
+## 远程存储的 cache
 
-上文提到冷数据为了优化查询的性能和对象存储资源节省，引入了 cache 的概念。在冷却后首次命中，Doris 会将已经冷却的数据又重新加载到 BE 
的本地磁盘，cache 有以下特性：
+为了优化查询的性能和对象存储资源节省，引入了 cache 的概念。在第一次查询远程存储的数据时，Doris 会将远程存储的数据加载到 BE 
的本地磁盘做缓存，cache 有以下特性：
 
 -   cache 实际存储于 BE 磁盘，不占用内存空间。
 
 -   cache 可以限制膨胀，通过 LRU 进行数据的清理
 
--   cache 的实现和联邦查询 Catalog 的 cache 是同一套实现，文档参考[此处](../lakehouse/filecache)
+-   cache 的实现和联邦查询 Catalog 的 cache 是同一套实现，文档参考[此处](../../lakehouse/filecache)
 
-## 冷数据的 Compaction
+## 远程存储的 Compaction
 
-冷数据传入的时间是数据 rowset 文件写入本地磁盘时刻起，加上冷却时间。由于数据并不是一次性写入和冷却的，因此避免在对象存储内的小文件问题，Doris 
也会进行冷数据的 Compaction。但是，冷数据的 Compaction 的频次和资源占用的优先级并不是很高，也推荐本地热数据 compaction 
后再执行冷却。具体可以通过以下 BE 参数调整：
+远程存储数据传入的时间是 rowset 文件写入本地磁盘时刻起，加上冷却时间。由于数据并不是一次性写入和冷却的，因此避免在对象存储内的小文件问题，Doris 
也会进行远程存储数据的 Compaction。但是，远程存储数据的 Compaction 的频次和资源占用的优先级并不是很高，也推荐本地热数据 
compaction 后再执行冷却。具体可以通过以下 BE 参数调整：
 
--   BE 参数`cold_data_compaction_thread_num`可以设置执行冷数据的 Compaction 的并发，默认是 2。
+-   BE 参数`cold_data_compaction_thread_num`可以设置执行远程存储的 Compaction 的并发，默认是 2。
 
--   BE 参数`cold_data_compaction_interval_sec`可以设置执行冷数据的 Compaction 的时间间隔，默认是 
1800，单位：秒，即半个小时。。
+-   BE 参数`cold_data_compaction_interval_sec`可以设置执行远程存储的 Compaction 的时间间隔，默认是 
1800，单位：秒，即半个小时。。
 
-## 冷数据的 Schema Change
+## 远程存储的 Schema Change
 
-数据冷却后支持 Schema Change 类型如下：
+远程存储支持 Schema Change 类型如下：
 
 -   增加、删除列
 
@@ -205,21 +191,17 @@ ALTER TABLE create_table_partition MODIFY PARTITION (*) 
SET("storage_policy"="te
 
 -   增加、修改索引
 
-## 冷数据的垃圾回收
+## 远程存储的垃圾回收
 
-冷数据的垃圾数据是指没有被任何 Replica 使用的数据，对象存储上可能会有如下情况产生的垃圾数据：
+远程存储的垃圾数据是指没有被任何 Replica 使用的数据，对象存储上可能会有如下情况产生的垃圾数据：
 
 1.  上传 rowset 失败但是有部分 segment 上传成功。
 
 2.  FE 重新选 CooldownReplica 后，新旧 CooldownReplica 的 rowset version 
不一致，FollowerReplica 都去同步新 CooldownReplica 的 CooldownMeta，旧 CooldownReplica 中 
version 不一致的 rowset 没有 Replica 使用成为垃圾数据。
 
-3.  冷数据 Compaction 后，合并前的 rowset 因为还可能被其他 Replica 使用不能立即删除，但是最终 
FollowerReplica 都使用了最新的合并后的 rowset，合并前的 rowset 成为垃圾数据。
-
-另外，对象上的垃圾数据并不会立即清理掉。BE 
参数`remove_unused_remote_files_interval_sec`可以设置冷数据的垃圾回收的时间间隔，默认是 21600，单位：秒，即 6 
个小时。
-
-## 未尽事项
+3.  远程存储数据 Compaction 后，合并前的 rowset 因为还可能被其他 Replica 使用不能立即删除，但是最终 
FollowerReplica 都使用了最新的合并后的 rowset，合并前的 rowset 成为垃圾数据。
 
--   一些远端占用指标更新获取不够完善
+另外，对象上的垃圾数据并不会立即清理掉。BE 
参数`remove_unused_remote_files_interval_sec`可以设置远程存储的垃圾回收的时间间隔，默认是 21600，单位：秒，即 
6 个小时。
 
 ## 常见问题
 
diff --git 
a/versioned_docs/version-2.1/table-design/tiered-storage/remote-storage.md 
b/versioned_docs/version-2.1/table-design/tiered-storage/remote-storage.md
index ca986f86d6..1380d1dcff 100644
--- a/versioned_docs/version-2.1/table-design/tiered-storage/remote-storage.md
+++ b/versioned_docs/version-2.1/table-design/tiered-storage/remote-storage.md
@@ -24,47 +24,17 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## Use Case
+### Feature Overview
 
-One significant use case in the future is similar to ES log storage, where 
data in the log scenario is split based on dates. Many of the data are cold 
data with infrequent queries, requiring a reduction in storage costs for such 
data. Considering cost-saving:
+Remote storage supports placing some data in external storage (such as object 
storage or HDFS), which saves costs without sacrificing functionality.
 
-- The pricing of regular cloud disks from various vendors is more expensive 
than object storage.
-
-- In actual online usage of the Doris Cluster, the utilization of regular 
cloud disks cannot reach 100%.
-
-- Cloud disks are not billed on demand, while object storage can be billed on 
demand.
-
-- Using regular cloud disks for high availability requires multiple replicas 
and replica migration in case of failures. In contrast, storing data on object 
storage eliminates these issues as it is shared.
-
-## Solution
-
-Set the freeze time at the partition level, which indicates how long a 
partition will be frozen, and define the location of remote storage for storing 
data after freezing. In the BE (Backend) daemon thread, the table's freeze 
condition is periodically checked. If a freeze condition is met, the data will 
be uploaded to object storage compatible with the S3 protocol and HDFS.
-
-Cold-hot tiering supports all Doris functionalities and only moves some data 
to object storage to save costs without sacrificing functionality. Therefore, 
it has the following characteristics:
-
-- Cold data is stored on object storage, and users do not need to worry about 
data consistency and security.
-
-- Flexible freeze strategy, where the cold remote storage property can be 
applied to both table and partition levels.
-
-- Users can query data without worrying about data distribution. If the data 
is not local, it will be pulled from the object storage and cached locally in 
the BE (Backend).
-
-- Replica clone optimization. If the stored data is on object storage, there 
is no need to fetch the stored data locally during replica cloning.
-
-- Remote object space recycling. If a table or partition is deleted or if 
space waste occurs during the cold-hot tiering process due to exceptional 
situations, a recycler thread will periodically recycle the space, saving 
storage resources.
-
-- Cache optimization, caching accessed cold data locally in the BE to achieve 
query performance similar to non-cold-hot tiering.
-
-- BE thread pool optimization, distinguishing between data sources from local 
and object storage to prevent delays in reading objects from impacting query 
performance.
-
-## Usage of Storage Policy
-
-The storage policy is the entry point for using the cold-hot tiering feature. 
Users only need to associate the storage policy with a table or partition 
during table creation or when using Doris.
-
-:::tip
-When creating an S3 resource, a remote S3 connection validation is performed 
to ensure the correct creation of the resource.
+:::warning Note
+Data in remote storage only has one replica. The reliability of the data 
depends on the reliability of the remote storage. You need to ensure that the 
remote storage employs EC (Erasure Coding) or multi-replica technology to 
guarantee data reliability.
 :::
 
-Here is an example of creating an S3 resource:
+### Usage Guide
+
+Using S3 object storage as an example, start by creating an S3 RESOURCE:
 
 ```sql
 CREATE RESOURCE "remote_s3"
@@ -81,13 +51,25 @@ PROPERTIES
     "s3.connection.request.timeout" = "3000",
     "s3.connection.timeout" = "1000"
 );
+```
+
+:::tip
+When creating the S3 RESOURCE, a remote connection check will be performed to 
ensure the resource is created correctly.
+:::
 
+Next, create a STORAGE POLICY and associate it with the previously created 
RESOURCE:
+
+```sql
 CREATE STORAGE POLICY test_policy
 PROPERTIES(
     "storage_resource" = "remote_s3",
     "cooldown_ttl" = "1d"
 );
+```
+
+Finally, specify the STORAGE POLICY when creating a table:
 
+```sql
 CREATE TABLE IF NOT EXISTS create_table_use_created_policy 
 (
     k1 BIGINT,
@@ -102,11 +84,11 @@ PROPERTIES(
 );
 ```
 
-:::warning Notice
-If you set `"enable_unique_key_merge_on_write" = "true"` in UNIQUE table, you 
can't use this feature.
+:::warning
+If the UNIQUE table has `"enable_unique_key_merge_on_write" = "true"`, this 
feature cannot be used.
 :::
 
-And here is an example of creating an HDFS resource:
+Create an HDFS RESOURCE:
 
 ```sql
 CREATE RESOURCE "remote_hdfs" PROPERTIES (
@@ -124,11 +106,12 @@ CREATE RESOURCE "remote_hdfs" PROPERTIES (
 CREATE STORAGE POLICY test_policy PROPERTIES (
     "storage_resource" = "remote_hdfs",
     "cooldown_ttl" = "300"
-)
+);
 
 CREATE TABLE IF NOT EXISTS create_table_use_created_policy (
     k1 BIGINT,
-    k2 LARGEINTv1 VARCHAR(2048)
+    k2 LARGEINT,
+    v1 VARCHAR(2048)
 )
 UNIQUE KEY(k1)
 DISTRIBUTED BY HASH (k1) BUCKETS 3
@@ -138,111 +121,100 @@ PROPERTIES(
 );
 ```
 
-:::warning Notice
-If you set `"enable_unique_key_merge_on_write" = "true"` in UNIQUE table, you 
can't use this feature.
+:::warning
+If the UNIQUE table has `"enable_unique_key_merge_on_write" = "true"`, this 
feature cannot be used.
 :::
 
-Associate a storage policy with an existing table by using the following 
command:
+In addition to creating tables with remote storage, Doris also supports 
setting remote storage for existing tables or partitions.
+
+For an existing table, associate a remote storage policy by running:
 
 ```sql
-ALTER TABLE create_table_not_have_policy SET ("storage_policy" = 
"test_policy");
+ALTER TABLE create_table_not_have_policy set ("storage_policy" = 
"test_policy");
 ```
 
-Associate a storage policy with an existing partition by using the following 
command:
+For an existing PARTITION, associate a remote storage policy by running:
 
 ```sql
-ALTER TABLE create_table_partition MODIFY PARTITION (*) SET ("storage_policy" 
= "test_policy");
+ALTER TABLE create_table_partition MODIFY PARTITION (*) 
SET("storage_policy"="test_policy");
 ```
 
 :::tip
-If you specify different storage policies for the entire table and some 
partitions during table creation, the storage policy set for the partitions 
will be ignored, and all partitions of the table will use the table's storage 
policy. If you want a specific partition to have a different storage policy 
than the others, you can use the method mentioned above to modify the 
association for that specific partition.
-
-For more details, please refer to the following documents in the Docs 
directory: 
[RESOURCE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE),
 
[POLICY](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-POLICY),
 [CREATE 
TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE),
 [ALTER 
TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-COLUMN),
 which provide detailed explanations.
+Note that if you specify different storage policies for the entire table and 
certain partitions, the storage policy of the table will take precedence for 
all partitions. If you need a partition to use a different storage policy, you 
can modify it using the method above for existing partitions.
 :::
 
-### Limitations
+For more details, please refer to the documentation in the **Docs** directory, 
such as 
[RESOURCE](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE),
 
[POLICY](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-POLICY),
 [CREATE 
TABLE](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE),
 and [ALTER 
TABLE](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-COLUMN),
 which provide detai [...]
 
-- A single table or partition can only be associated with one storage policy. 
Once associated, the storage policy cannot be dropped without first removing 
the association between them.
+### Limitations
 
-- The object information associated with a storage policy does not support 
modifying the data storage path, such as bucket, endpoint, root_path, and other 
information.
+- A single table or partition can only be associated with one storage policy. 
Once associated, the storage policy cannot be dropped until the association is 
removed.
 
-- Storage policies support creation, modification, and deletion. Before 
deleting a storage policy, ensure that no tables are referencing the storage 
policy.
+- The storage path information associated with a storage policy (e.g., bucket, 
endpoint, root_path) cannot be modified after the policy is created.
 
-- When the Merge-on-Write feature is enabled, the Unique model does not 
support setting a storage policy.
+- Storage policies support creation, modification, and deletion. However, 
before deleting a policy, you need to ensure that no tables are referencing 
this storage policy.
 
+- The Unique model with Merge-on-write enabled may face restrictions... 
 
-## Occupied Size of Cold Data Objects
+## Viewing Remote Storage Usage
 
-Method 1: You can use the `show proc '/backends'` command to view the size of 
each backend's uploaded objects. Look for the `RemoteUsedCapacity` field. 
Please note that this method may have some latency.
+Method 1: You can view the size uploaded to the object storage by each BE by 
using `show proc '/backends'`, specifically the `RemoteUsedCapacity` item. Note 
that this method may have some delay.
 
-Method 2: You can use the `show tablets from tableName` command to view the 
size of each tablet in a table, indicated by the `RemoteDataSize` field.
+Method 2: You can view the object size used by each tablet of a table by using 
`show tablets from tableName`, specifically the `RemoteDataSize` item.
 
-## Cache for Cold Data
+## Remote Storage Cache
 
-As mentioned earlier, caching is introduced for cold data to optimize query 
performance and save object storage resources. When cold data is first accessed 
after cooling, Doris reloads the cooled data onto the local disk of the backend 
(BE). The cold data cache has the following characteristics:
+To optimize query performance and save object storage resources, the concept 
of cache is introduced. When querying data from remote storage for the first 
time, Doris will load the data from remote storage to the BE's local disk as a 
cache. The cache has the following characteristics:
 
 - The cache is stored on the BE's disk and does not occupy memory space.
+- The cache can be limited in size, with data cleanup performed using an LRU 
(Least Recently Used) policy.
+- The implementation of the cache is the same as the federated query catalog 
cache. For more information, refer to the 
[documentation](../../lakehouse/filecache).
 
-- The cache can be limited in size and uses LRU (Least Recently Used) for data 
eviction.
-
-- The implementation of the cache for cold data is the same as the cache for 
federated query catalog. Please refer to the documentation at 
[Filecache](../lakehouse/filecache) for more details.
-
-## Compaction of Cold Data
-
-The time at which cold data enters is counted from the moment the data rowset 
file is written to the local disk, plus the cooling duration. Since data is not 
written and cooled all at once, Doris performs compaction on cold data to avoid 
the issue of small files within object storage. However, the frequency and 
resource prioritization of cold data compaction are not very high. It is 
recommended to perform compaction on local hot data before cooling. You can 
adjust the following BE parameters:
-
-- The BE parameter `cold_data_compaction_thread_num` sets the concurrency for 
cold data compaction. The default value is 2.
+## Remote Storage Compaction
 
-- The BE parameter `cold_data_compaction_interval_sec` sets the time interval 
for cold data compaction. The default value is 1800 seconds (30 minutes).
+The data in remote storage is considered to be "ingested" at the moment the 
rowset file is written to the local disk, plus the cooldown time. Since data is 
not written and cooled all at once, to avoid the small file problem in object 
storage, Doris will perform compaction on remote storage data. However, the 
frequency and priority of remote storage compaction are not very high. It is 
recommended to perform compaction on local hot data before executing cooldown. 
The following BE parameter [...]
 
-## Schema Change for Cold Data
+- The BE parameter `cold_data_compaction_thread_num` sets the concurrency for 
performing compaction on remote storage. The default value is 2.
+- The BE parameter `cold_data_compaction_interval_sec` sets the time interval 
for executing remote storage compaction. The default value is 1800 seconds (30 
minutes).
 
-The following schema change types are supported for cold data:
+## Remote Storage Schema Change
 
-- Adding or deleting columns
+Remote storage schema changes are supported. These include:
 
+- Adding or removing columns
 - Modifying column types
-
 - Adjusting column order
-
 - Adding or modifying indexes
 
-## Garbage Collection of Cold Data
-
-Garbage data for cold data refers to data that is not used by any replica. The 
following situations may generate garbage data on object storage:
-
-1. Partial segment upload succeeds while the upload of the rowset fails.
+## Remote Storage Garbage Collection
 
-2. After the FE reselects the CooldownReplica, the rowset versions of the old 
and new CooldownReplica do not match. FollowerReplicas synchronize the 
CooldownMeta of the new CooldownReplica, and the rowsets with inconsistent 
versions in the old CooldownReplica become garbage data.
+Remote storage garbage data refers to data that is not being used by any 
replica. Garbage data may occur on object storage in the following cases:
 
-3. After cold data compaction, the rowsets before merging cannot be 
immediately deleted because they may still be used by other replicas. However, 
eventually, all FollowerReplicas use the latest merged rowset, and the rowsets 
before merging become garbage data.
+1. Rowsets upload fails but some segments are successfully uploaded.
+2. The FE re-selects a CooldownReplica, causing an inconsistency between the 
rowset versions of the old and new CooldownReplica. FollowerReplicas 
synchronize the CooldownMeta of the new CooldownReplica, and the rowsets with 
version mismatches in the old CooldownReplica become garbage data.
+3. After a remote storage compaction, the rowsets before merging cannot be 
immediately deleted because they may still be used by other replicas. 
Eventually, once all FollowerReplicas use the latest merged rowset, the 
pre-merge rowsets become garbage data.
 
-Furthermore, the garbage data on objects is not immediately cleaned up. The BE 
parameter `remove_unused_remote_files_interval_sec` sets the time interval for 
garbage collection of cold data. The default value is 21600 seconds (6 hours).
+Additionally, garbage data on objects will not be cleaned up immediately. The 
BE parameter `remove_unused_remote_files_interval_sec` sets the time interval 
for remote storage garbage collection, with a default value of 21600 seconds (6 
hours).
 
-## TODOs
-
-- Some remote occupancy metrics may not have comprehensive update retrieval.
-
-## FAQs
+## Common Issues
 
 1. `ERROR 1105 (HY000): errCode = 2, detailMessage = Failed to create 
repository: connect to s3 failed: Unable to marshall request to JSON: host must 
not be null.`
 
-The S3 SDK defaults to using the virtual-hosted style. However, some object 
storage systems (e.g., MinIO) may not have virtual-hosted style access enabled 
or supported. In such cases, you can add the `use_path_style` parameter to 
force the use of path-style access:
-
-```sql
-CREATE RESOURCE "remote_s3"
-PROPERTIES
-(
-    "type" = "s3",
-    "s3.endpoint" = "bj.s3.com",
-    "s3.region" = "bj",
-    "s3.bucket" = "test-bucket",
-    "s3.root.path" = "path/to/root",
-    "s3.access_key" = "bbb",
-    "s3.secret_key" = "aaaa",
-    "s3.connection.maximum" = "50",
-    "s3.connection.request.timeout" = "3000",
-    "s3.connection.timeout" = "1000",
-    "use_path_style" = "true"
-);
-```
+   The S3 SDK uses the virtual-hosted style access method by default. However, 
some object storage systems (such as MinIO) may not have virtual-hosted style 
access enabled or supported. In this case, you can add the `use_path_style` 
parameter to force path-style access:
+
+   ```sql
+   CREATE RESOURCE "remote_s3"
+   PROPERTIES
+   (
+       "type" = "s3",
+       "s3.endpoint" = "bj.s3.com",
+       "s3.region" = "bj",
+       "s3.bucket" = "test-bucket",
+       "s3.root.path" = "path/to/root",
+       "s3.access_key" = "bbb",
+       "s3.secret_key" = "aaaa",
+       "s3.connection.maximum" = "50",
+       "s3.connection.request.timeout" = "3000",
+       "s3.connection.timeout" = "1000",
+       "use_path_style" = "true"
+   );
+   ```
\ No newline at end of file
diff --git 
a/versioned_docs/version-3.0/table-design/tiered-storage/remote-storage.md 
b/versioned_docs/version-3.0/table-design/tiered-storage/remote-storage.md
index ca986f86d6..1380d1dcff 100644
--- a/versioned_docs/version-3.0/table-design/tiered-storage/remote-storage.md
+++ b/versioned_docs/version-3.0/table-design/tiered-storage/remote-storage.md
@@ -24,47 +24,17 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## Use Case
+### Feature Overview
 
-One significant use case in the future is similar to ES log storage, where 
data in the log scenario is split based on dates. Many of the data are cold 
data with infrequent queries, requiring a reduction in storage costs for such 
data. Considering cost-saving:
+Remote storage supports placing some data in external storage (such as object 
storage or HDFS), which saves costs without sacrificing functionality.
 
-- The pricing of regular cloud disks from various vendors is more expensive 
than object storage.
-
-- In actual online usage of the Doris Cluster, the utilization of regular 
cloud disks cannot reach 100%.
-
-- Cloud disks are not billed on demand, while object storage can be billed on 
demand.
-
-- Using regular cloud disks for high availability requires multiple replicas 
and replica migration in case of failures. In contrast, storing data on object 
storage eliminates these issues as it is shared.
-
-## Solution
-
-Set the freeze time at the partition level, which indicates how long a 
partition will be frozen, and define the location of remote storage for storing 
data after freezing. In the BE (Backend) daemon thread, the table's freeze 
condition is periodically checked. If a freeze condition is met, the data will 
be uploaded to object storage compatible with the S3 protocol and HDFS.
-
-Cold-hot tiering supports all Doris functionalities and only moves some data 
to object storage to save costs without sacrificing functionality. Therefore, 
it has the following characteristics:
-
-- Cold data is stored on object storage, and users do not need to worry about 
data consistency and security.
-
-- Flexible freeze strategy, where the cold remote storage property can be 
applied to both table and partition levels.
-
-- Users can query data without worrying about data distribution. If the data 
is not local, it will be pulled from the object storage and cached locally in 
the BE (Backend).
-
-- Replica clone optimization. If the stored data is on object storage, there 
is no need to fetch the stored data locally during replica cloning.
-
-- Remote object space recycling. If a table or partition is deleted or if 
space waste occurs during the cold-hot tiering process due to exceptional 
situations, a recycler thread will periodically recycle the space, saving 
storage resources.
-
-- Cache optimization, caching accessed cold data locally in the BE to achieve 
query performance similar to non-cold-hot tiering.
-
-- BE thread pool optimization, distinguishing between data sources from local 
and object storage to prevent delays in reading objects from impacting query 
performance.
-
-## Usage of Storage Policy
-
-The storage policy is the entry point for using the cold-hot tiering feature. 
Users only need to associate the storage policy with a table or partition 
during table creation or when using Doris.
-
-:::tip
-When creating an S3 resource, a remote S3 connection validation is performed 
to ensure the correct creation of the resource.
+:::warning Note
+Data in remote storage only has one replica. The reliability of the data 
depends on the reliability of the remote storage. You need to ensure that the 
remote storage employs EC (Erasure Coding) or multi-replica technology to 
guarantee data reliability.
 :::
 
-Here is an example of creating an S3 resource:
+### Usage Guide
+
+Using S3 object storage as an example, start by creating an S3 RESOURCE:
 
 ```sql
 CREATE RESOURCE "remote_s3"
@@ -81,13 +51,25 @@ PROPERTIES
     "s3.connection.request.timeout" = "3000",
     "s3.connection.timeout" = "1000"
 );
+```
+
+:::tip
+When creating the S3 RESOURCE, a remote connection check will be performed to 
ensure the resource is created correctly.
+:::
 
+Next, create a STORAGE POLICY and associate it with the previously created 
RESOURCE:
+
+```sql
 CREATE STORAGE POLICY test_policy
 PROPERTIES(
     "storage_resource" = "remote_s3",
     "cooldown_ttl" = "1d"
 );
+```
+
+Finally, specify the STORAGE POLICY when creating a table:
 
+```sql
 CREATE TABLE IF NOT EXISTS create_table_use_created_policy 
 (
     k1 BIGINT,
@@ -102,11 +84,11 @@ PROPERTIES(
 );
 ```
 
-:::warning Notice
-If you set `"enable_unique_key_merge_on_write" = "true"` in UNIQUE table, you 
can't use this feature.
+:::warning
+If the UNIQUE table has `"enable_unique_key_merge_on_write" = "true"`, this 
feature cannot be used.
 :::
 
-And here is an example of creating an HDFS resource:
+Create an HDFS RESOURCE:
 
 ```sql
 CREATE RESOURCE "remote_hdfs" PROPERTIES (
@@ -124,11 +106,12 @@ CREATE RESOURCE "remote_hdfs" PROPERTIES (
 CREATE STORAGE POLICY test_policy PROPERTIES (
     "storage_resource" = "remote_hdfs",
     "cooldown_ttl" = "300"
-)
+);
 
 CREATE TABLE IF NOT EXISTS create_table_use_created_policy (
     k1 BIGINT,
-    k2 LARGEINTv1 VARCHAR(2048)
+    k2 LARGEINT,
+    v1 VARCHAR(2048)
 )
 UNIQUE KEY(k1)
 DISTRIBUTED BY HASH (k1) BUCKETS 3
@@ -138,111 +121,100 @@ PROPERTIES(
 );
 ```
 
-:::warning Notice
-If you set `"enable_unique_key_merge_on_write" = "true"` in UNIQUE table, you 
can't use this feature.
+:::warning
+If the UNIQUE table has `"enable_unique_key_merge_on_write" = "true"`, this 
feature cannot be used.
 :::
 
-Associate a storage policy with an existing table by using the following 
command:
+In addition to creating tables with remote storage, Doris also supports 
setting remote storage for existing tables or partitions.
+
+For an existing table, associate a remote storage policy by running:
 
 ```sql
-ALTER TABLE create_table_not_have_policy SET ("storage_policy" = 
"test_policy");
+ALTER TABLE create_table_not_have_policy set ("storage_policy" = 
"test_policy");
 ```
 
-Associate a storage policy with an existing partition by using the following 
command:
+For an existing PARTITION, associate a remote storage policy by running:
 
 ```sql
-ALTER TABLE create_table_partition MODIFY PARTITION (*) SET ("storage_policy" 
= "test_policy");
+ALTER TABLE create_table_partition MODIFY PARTITION (*) 
SET("storage_policy"="test_policy");
 ```
 
 :::tip
-If you specify different storage policies for the entire table and some 
partitions during table creation, the storage policy set for the partitions 
will be ignored, and all partitions of the table will use the table's storage 
policy. If you want a specific partition to have a different storage policy 
than the others, you can use the method mentioned above to modify the 
association for that specific partition.
-
-For more details, please refer to the following documents in the Docs 
directory: 
[RESOURCE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE),
 
[POLICY](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-POLICY),
 [CREATE 
TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE),
 [ALTER 
TABLE](../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-COLUMN),
 which provide detailed explanations.
+Note that if you specify different storage policies for the entire table and 
certain partitions, the storage policy of the table will take precedence for 
all partitions. If you need a partition to use a different storage policy, you 
can modify it using the method above for existing partitions.
 :::
 
-### Limitations
+For more details, please refer to the documentation in the **Docs** directory, 
such as 
[RESOURCE](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE),
 
[POLICY](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-POLICY),
 [CREATE 
TABLE](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-TABLE),
 and [ALTER 
TABLE](../../sql-manual/sql-statements/Data-Definition-Statements/Alter/ALTER-TABLE-COLUMN),
 which provide detai [...]
 
-- A single table or partition can only be associated with one storage policy. 
Once associated, the storage policy cannot be dropped without first removing 
the association between them.
+### Limitations
 
-- The object information associated with a storage policy does not support 
modifying the data storage path, such as bucket, endpoint, root_path, and other 
information.
+- A single table or partition can only be associated with one storage policy. 
Once associated, the storage policy cannot be dropped until the association is 
removed.
 
-- Storage policies support creation, modification, and deletion. Before 
deleting a storage policy, ensure that no tables are referencing the storage 
policy.
+- The storage path information associated with a storage policy (e.g., bucket, 
endpoint, root_path) cannot be modified after the policy is created.
 
-- When the Merge-on-Write feature is enabled, the Unique model does not 
support setting a storage policy.
+- Storage policies support creation, modification, and deletion. However, 
before deleting a policy, you need to ensure that no tables are referencing 
this storage policy.
 
+- The Unique model with Merge-on-write enabled may face restrictions... 
 
-## Occupied Size of Cold Data Objects
+## Viewing Remote Storage Usage
 
-Method 1: You can use the `show proc '/backends'` command to view the size of 
each backend's uploaded objects. Look for the `RemoteUsedCapacity` field. 
Please note that this method may have some latency.
+Method 1: You can view the size uploaded to the object storage by each BE by 
using `show proc '/backends'`, specifically the `RemoteUsedCapacity` item. Note 
that this method may have some delay.
 
-Method 2: You can use the `show tablets from tableName` command to view the 
size of each tablet in a table, indicated by the `RemoteDataSize` field.
+Method 2: You can view the object size used by each tablet of a table by using 
`show tablets from tableName`, specifically the `RemoteDataSize` item.
 
-## Cache for Cold Data
+## Remote Storage Cache
 
-As mentioned earlier, caching is introduced for cold data to optimize query 
performance and save object storage resources. When cold data is first accessed 
after cooling, Doris reloads the cooled data onto the local disk of the backend 
(BE). The cold data cache has the following characteristics:
+To optimize query performance and save object storage resources, the concept 
of cache is introduced. When querying data from remote storage for the first 
time, Doris will load the data from remote storage to the BE's local disk as a 
cache. The cache has the following characteristics:
 
 - The cache is stored on the BE's disk and does not occupy memory space.
+- The cache can be limited in size, with data cleanup performed using an LRU 
(Least Recently Used) policy.
+- The implementation of the cache is the same as the federated query catalog 
cache. For more information, refer to the 
[documentation](../../lakehouse/filecache).
 
-- The cache can be limited in size and uses LRU (Least Recently Used) for data 
eviction.
-
-- The implementation of the cache for cold data is the same as the cache for 
federated query catalog. Please refer to the documentation at 
[Filecache](../lakehouse/filecache) for more details.
-
-## Compaction of Cold Data
-
-The time at which cold data enters is counted from the moment the data rowset 
file is written to the local disk, plus the cooling duration. Since data is not 
written and cooled all at once, Doris performs compaction on cold data to avoid 
the issue of small files within object storage. However, the frequency and 
resource prioritization of cold data compaction are not very high. It is 
recommended to perform compaction on local hot data before cooling. You can 
adjust the following BE parameters:
-
-- The BE parameter `cold_data_compaction_thread_num` sets the concurrency for 
cold data compaction. The default value is 2.
+## Remote Storage Compaction
 
-- The BE parameter `cold_data_compaction_interval_sec` sets the time interval 
for cold data compaction. The default value is 1800 seconds (30 minutes).
+The data in remote storage is considered to be "ingested" at the moment the 
rowset file is written to the local disk, plus the cooldown time. Since data is 
not written and cooled all at once, to avoid the small file problem in object 
storage, Doris will perform compaction on remote storage data. However, the 
frequency and priority of remote storage compaction are not very high. It is 
recommended to perform compaction on local hot data before executing cooldown. 
The following BE parameter [...]
 
-## Schema Change for Cold Data
+- The BE parameter `cold_data_compaction_thread_num` sets the concurrency for 
performing compaction on remote storage. The default value is 2.
+- The BE parameter `cold_data_compaction_interval_sec` sets the time interval 
for executing remote storage compaction. The default value is 1800 seconds (30 
minutes).
 
-The following schema change types are supported for cold data:
+## Remote Storage Schema Change
 
-- Adding or deleting columns
+Remote storage schema changes are supported. These include:
 
+- Adding or removing columns
 - Modifying column types
-
 - Adjusting column order
-
 - Adding or modifying indexes
 
-## Garbage Collection of Cold Data
-
-Garbage data for cold data refers to data that is not used by any replica. The 
following situations may generate garbage data on object storage:
-
-1. Partial segment upload succeeds while the upload of the rowset fails.
+## Remote Storage Garbage Collection
 
-2. After the FE reselects the CooldownReplica, the rowset versions of the old 
and new CooldownReplica do not match. FollowerReplicas synchronize the 
CooldownMeta of the new CooldownReplica, and the rowsets with inconsistent 
versions in the old CooldownReplica become garbage data.
+Remote storage garbage data refers to data that is not being used by any 
replica. Garbage data may occur on object storage in the following cases:
 
-3. After cold data compaction, the rowsets before merging cannot be 
immediately deleted because they may still be used by other replicas. However, 
eventually, all FollowerReplicas use the latest merged rowset, and the rowsets 
before merging become garbage data.
+1. Rowsets upload fails but some segments are successfully uploaded.
+2. The FE re-selects a CooldownReplica, causing an inconsistency between the 
rowset versions of the old and new CooldownReplica. FollowerReplicas 
synchronize the CooldownMeta of the new CooldownReplica, and the rowsets with 
version mismatches in the old CooldownReplica become garbage data.
+3. After a remote storage compaction, the rowsets before merging cannot be 
immediately deleted because they may still be used by other replicas. 
Eventually, once all FollowerReplicas use the latest merged rowset, the 
pre-merge rowsets become garbage data.
 
-Furthermore, the garbage data on objects is not immediately cleaned up. The BE 
parameter `remove_unused_remote_files_interval_sec` sets the time interval for 
garbage collection of cold data. The default value is 21600 seconds (6 hours).
+Additionally, garbage data on objects will not be cleaned up immediately. The 
BE parameter `remove_unused_remote_files_interval_sec` sets the time interval 
for remote storage garbage collection, with a default value of 21600 seconds (6 
hours).
 
-## TODOs
-
-- Some remote occupancy metrics may not have comprehensive update retrieval.
-
-## FAQs
+## Common Issues
 
 1. `ERROR 1105 (HY000): errCode = 2, detailMessage = Failed to create 
repository: connect to s3 failed: Unable to marshall request to JSON: host must 
not be null.`
 
-The S3 SDK defaults to using the virtual-hosted style. However, some object 
storage systems (e.g., MinIO) may not have virtual-hosted style access enabled 
or supported. In such cases, you can add the `use_path_style` parameter to 
force the use of path-style access:
-
-```sql
-CREATE RESOURCE "remote_s3"
-PROPERTIES
-(
-    "type" = "s3",
-    "s3.endpoint" = "bj.s3.com",
-    "s3.region" = "bj",
-    "s3.bucket" = "test-bucket",
-    "s3.root.path" = "path/to/root",
-    "s3.access_key" = "bbb",
-    "s3.secret_key" = "aaaa",
-    "s3.connection.maximum" = "50",
-    "s3.connection.request.timeout" = "3000",
-    "s3.connection.timeout" = "1000",
-    "use_path_style" = "true"
-);
-```
+   The S3 SDK uses the virtual-hosted style access method by default. However, 
some object storage systems (such as MinIO) may not have virtual-hosted style 
access enabled or supported. In this case, you can add the `use_path_style` 
parameter to force path-style access:
+
+   ```sql
+   CREATE RESOURCE "remote_s3"
+   PROPERTIES
+   (
+       "type" = "s3",
+       "s3.endpoint" = "bj.s3.com",
+       "s3.region" = "bj",
+       "s3.bucket" = "test-bucket",
+       "s3.root.path" = "path/to/root",
+       "s3.access_key" = "bbb",
+       "s3.secret_key" = "aaaa",
+       "s3.connection.maximum" = "50",
+       "s3.connection.request.timeout" = "3000",
+       "s3.connection.timeout" = "1000",
+       "use_path_style" = "true"
+   );
+   ```
\ No newline at end of file


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris-website) branch master updated: [Doc](remote storage)Modify remote storage doc (#1304)

Reply via email to