(doris-website) branch master updated: Add recycler doc (#2777)

dataroaring Mon, 25 Aug 2025 00:52:58 -0700

This is an automated email from the ASF dual-hosted git repository.

dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new ebe7d28f7e2 Add recycler doc (#2777)
ebe7d28f7e2 is described below

commit ebe7d28f7e2fc76e052bd92f1fa27812af89de42
Author: abmdocrt <[email protected]>
AuthorDate: Mon Aug 25 15:52:49 2025 +0800

    Add recycler doc (#2777)
    
    ## Versions
    
    - [x] dev
    - [ ] 3.0
    - [ ] 2.1
    - [ ] 2.0
    
    ## Languages
    
    - [ ] Chinese
    - [ ] English
    
    ## Docs Checklist
    
    - [ ] Checked by AI
    - [ ] Test Cases Built
---
 docs/compute-storage-decoupled/recycler.md         | 285 +++++++++++++++++++++
 .../current/compute-storage-decoupled/recycler.md  | 285 +++++++++++++++++++++
 sidebars.json                                      |   1 +
 3 files changed, 571 insertions(+)

diff --git a/docs/compute-storage-decoupled/recycler.md 
b/docs/compute-storage-decoupled/recycler.md
new file mode 100644
index 00000000000..726d0b1099d
--- /dev/null
+++ b/docs/compute-storage-decoupled/recycler.md
@@ -0,0 +1,285 @@
+---
+{
+    "title": "Recycler",
+    "language": "en"
+}
+---
+
+# Doris Storage-Compute Separation Data Recycling
+
+## Introduction
+
+In the era of big data, data lifecycle management has become one of the core 
challenges for distributed database systems. With the explosive growth of 
business data volumes, how to achieve efficient storage space reclamation while 
ensuring data security has become a critical issue that every database product 
must address.
+
+Apache Doris, as a next-generation real-time analytical database, adopts a 
Mark-for-Deletion data recycling strategy under its storage-compute separation 
architecture, with deep optimization and enhancements built upon this 
foundation. By introducing fine-grained hierarchical recycling mechanisms, 
flexible and configurable expiration protection, multiple data consistency 
checks, and a comprehensive observability system, while fully considering the 
complexity of distributed environments,  [...]
+
+This article will provide an in-depth analysis of the data recycling mechanism 
under Doris's storage-compute separation architecture, from design philosophy 
to technical implementation, from core principles to practical tuning, 
comprehensively showcasing the technical details and application value of this 
mature solution.
+
+## 1. Comparison of Common Data Recycling Strategies
+
+### 1.1 Synchronous Deletion
+
+The most straightforward deletion method. When data is deleted (e.g., drop 
table), the related metadata and corresponding files are immediately deleted. 
Once data is deleted, it cannot be recovered. While the operation is simple and 
direct, deletion speed is slow and risk is high.
+
+### 1.2 Reconciliation Deletion (Reverse)
+
+This approach determines which data can be deleted through periodic 
reconciliation mechanisms. When data is deleted (e.g., drop table), only 
metadata is deleted. The system periodically performs data reconciliation, 
scans file data, identifies data that is no longer referenced by metadata or 
has expired, and then performs batch deletion.
+
+### 1.3 Mark-for-Deletion (Forward)
+
+This approach determines which data can be deleted by periodically scanning 
deleted metadata. When data is deleted (e.g., drop table), instead of directly 
deleting the data, the metadata to be deleted is marked as deleted. The system 
periodically scans the marked metadata and finds corresponding files for batch 
deletion.
+
+## 2. Benefits of Doris Storage-Compute Separation Mark-for-Deletion
+
+Doris's storage-compute separation architecture chose the mark-for-deletion 
method, which effectively ensures data consistency while achieving the optimal 
balance between performance, security, and resource utilization.
+
+Taking drop table as an example, mark-for-deletion has the following 
significant advantages over the other two approaches:
+
+### 2.1 Performance Advantages
+
+- **Fast Response Time**: Drop table operations only need to mark metadata KV 
data as deleted, without waiting for file I/O operations to complete, allowing 
users to receive immediate responses. This is particularly important in large 
table deletion scenarios, avoiding long blocking periods.
+- **High Batch Processing Efficiency**: Periodically scanning deletion-marked 
metadata KV allows for batch processing of file deletion operations, reducing 
system call frequency and improving overall I/O efficiency.
+
+### 2.2 Security Advantages
+
+- **Misoperation Protection**: Mark-for-deletion provides a buffer period 
during which accidentally deleted tables can be recovered before actual file 
deletion, significantly reducing human operational risks.
+- **Transaction Security**: Marking operations are lightweight metadata 
modifications that more easily ensure atomicity, reducing data inconsistency 
issues caused by system failures during deletion.
+
+### 2.3 Resource Management Advantages
+
+- **System Load Balancing**: File deletion operations can be performed during 
system idle time, avoiding the consumption of large amounts of I/O resources 
during business peak hours that would impact normal operations.
+- **Controllable Deletion Pace**: Deletion speed can be dynamically adjusted 
based on system load, avoiding system impact from massive deletion operations.
+
+### 2.4 Comparison with Other Solutions
+
+- **Compared to Synchronous Deletion**: Avoids long waiting times when 
deleting large tables, improving user experience. Additionally, provides a 
deletion buffer period ensuring security and preventing human operational 
accidents to some extent.
+- **Compared to Reconciliation Deletion**: Only scans deletion-marked 
metadata, making scan data more targeted, reducing unnecessary I/O operations, 
higher efficiency, no need to traverse all files to determine if they are 
referenced, faster and more efficient deletion.
+
+## 3. Principles of Doris Data Recycling
+
+The recycler is an independently deployed component responsible for 
periodically recycling expired garbage files. One recycler can simultaneously 
recycle multiple instances, and one instance can only be recycled by one 
recycler at the same time.
+
+### 3.1 Mark-for-Deletion
+
+Whenever a drop command is executed or the system generates garbage data 
(e.g., compacted rowset), the corresponding metadata KV is marked as recycled. 
The recycler periodically scans recycle KVs in the instance, deletes 
corresponding object files, and then deletes the recycle KV, ensuring deletion 
order safety.
+
+### 3.2 Hierarchical Structure
+
+When the recycler recycles instance data, multiple tasks run concurrently, 
such as recycle_indexes, recycle_partition, recycle_compacted_rowsets, 
recycle_txn, etc.
+
+Data is deleted according to a hierarchical structure during recycling: 
deleting a table deletes corresponding partitions, deleting a partition deletes 
corresponding tablets, deleting a tablet deletes corresponding rowsets, 
deleting a rowset deletes corresponding segment files. The final execution 
object is Doris's smallest file unit, the segment file.
+
+Taking drop table as an example, during the recycling process, the system 
first deletes segment object files, then deletes recycle rowset KV after 
success, deletes recycle tablet KV after all tablet rowsets are successfully 
deleted, and so on, ultimately deleting all object files and recycle KVs in the 
table.
+
+### 3.3 Expiration Mechanism
+
+Each object to be recycled records its corresponding expiration time in its 
KV. The system identifies objects to delete by scanning various recycle KVs and 
calculating expiration times. If a user accidentally drops a table, due to the 
expiration mechanism, the recycler will not immediately delete its data but 
will wait for a retention time, providing the possibility for data recovery.
+
+### 3.4 Reliability Guarantees
+
+1. **Phased Deletion**: First delete data files, then delete metadata, finally 
delete index or partition keys, ensuring deletion order safety.
+
+2. **Lease Protection Mechanism**: Each recycler must obtain a lease before 
starting recycling, starts a background thread to periodically renew the lease. 
Only when the lease expires or status is IDLE can a new recycler take over, 
ensuring that one instance can only be recycled by one recycler at the same 
time, avoiding data inconsistency issues caused by concurrent recycling.
+
+### 3.5 Multiple Check Mechanisms
+
+The Recycler implements multiple mutual check mechanisms (checker) between FE 
metadata, MS KV, and object files. The checker performs forward and reverse 
checks on all Recycler KVs, object files, and FE in-memory metadata in the 
background.
+
+Taking segment file KV and object file checking as an example:
+- Forward Check: Scan all KVs to check if corresponding segment files exist 
and if corresponding segment information exists in FE memory.
+- Reverse Check: Scan all segment files to verify if corresponding KVs exist 
and if corresponding segment information exists in FE memory.
+
+Multiple check mechanisms ensure the correctness of recycler data deletion. If 
unrecycled or over-recycled situations occur under certain circumstances, the 
checker will capture relevant information. Operations personnel can manually 
delete excess garbage files based on checker information, or rely on object 
multi-versioning to recover accidentally deleted files, providing an effective 
safety net.
+
+Currently, forward and reverse checks for segment files, idx files, delete 
bitmap metadata, etc., have been implemented. In the future, checks for all 
metadata will be implemented to further ensure recycler correctness and 
reliability.
+
+## 4. Observability Mechanism
+
+Recycler recycling efficiency and progress are of great concern to users. 
Therefore, we have greatly improved recycler observability by adding numerous 
visual monitoring metrics and necessary logs. Visual metrics allow users to 
intuitively see recycling progress, efficiency, exceptions, and other basic 
information. We also provide more metrics for users to see more detailed 
information, such as estimating the next recycle time for a certain instance. 
Added logs also enable operations and [...]
+
+### 4.1 Addressing User Concerns
+
+**Basic Questions:**
+- Repository-level recycling speed: bytes recycled per second, quantity of 
various objects recycled per second
+- Repository-level data volume and time consumption per recycling
+- Repository-level recycling progress: recycled data volume, pending recycling 
data volume
+
+**Advanced Questions:**
+- Recycling status of each storage backend
+- Recycler success time, failure time
+- Estimated time for next Recycler execution
+
+All this information can be observed in real-time through the MS panel.
+
+### 4.2 Observation Metrics
+
+| Variable Name | Metrics Name | Dimensions/Labels | Description | Example |
+|---------------|--------------|-------------------|-------------|---------|
+| g_bvar_recycler_vault_recycle_status | recycler_vault_recycle_status | 
instance_id, resource_id, status | Records status count of vault recycling 
operations by instance ID, resource ID, and status | 
recycler_vault_recycle_status{instance_id="default_instance_id",resource_id="1",status="normal"}
 8 |
+| g_bvar_recycler_vault_recycle_task_concurrency | 
recycler_vault_recycle_task_concurrency | instance_id, resource_id | Counts 
vault recycle file task concurrency by instance ID and resource ID | 
recycler_vault_recycle_task_concurrency{instance_id="default_instance_id",resource_id="1"}
 2 |
+| g_bvar_recycler_instance_last_round_recycled_num | 
recycler_instance_last_round_recycled_num | instance_id, resource_type | Counts 
recycled object quantity in the last round by instance ID and object type | 
recycler_instance_last_round_recycled_num{instance_id="default_instance_id",resource_type="recycle_rowsets"}
 13 |
+| g_bvar_recycler_instance_last_round_to_recycle_num | 
recycler_instance_last_round_to_recycle_num | instance_id, resource_type | 
Counts objects to be recycled in the last round by instance ID and object type 
| 
recycler_instance_last_round_to_recycle_num{instance_id="default_instance_id",resource_type="recycle_rowsets"}
 13 |
+| g_bvar_recycler_instance_last_round_recycled_bytes | 
recycler_instance_last_round_recycled_bytes | instance_id, resource_type | 
Counts recycled data size (bytes) in the last round by instance ID and object 
type | 
recycler_instance_last_round_recycled_bytes{instance_id="default_instance_id",resource_type="recycle_rowsets"}
 13509 |
+| g_bvar_recycler_instance_last_round_to_recycle_bytes | 
recycler_instance_last_round_to_recycle_bytes | instance_id, resource_type | 
Counts data size to be recycled (bytes) in the last round by instance ID and 
object type | 
recycler_instance_last_round_to_recycle_bytes{instance_id="default_instance_id",resource_type="recycle_rowsets"}
 13509 |
+| g_bvar_recycler_instance_last_round_recycle_elpased_ts | 
recycler_instance_last_round_recycle_elpased_ts | instance_id, resource_type | 
Records elapsed time (ms) of the last recycling operation by instance ID and 
object type | 
recycler_instance_last_round_recycle_elpased_ts{instance_id="default_instance_id",resource_type="recycle_rowsets"}
 62 |
+| g_bvar_recycler_instance_recycle_round | recycler_instance_recycle_round | 
instance_id, resource_type | Counts recycling operation rounds by instance ID 
and object type | 
recycler_instance_recycle_round{instance_id="default_instance_id_2",object_type="recycle_rowsets"}
 2 |
+| g_bvar_recycler_instance_recycle_time_per_resource | 
recycler_instance_recycle_time_per_resource | instance_id, resource_type | 
Records recycling speed by instance ID and object type (time needed per 
resource in ms, -1 means no recycling) | 
recycler_instance_recycle_time_per_resource{instance_id="default_instance_id",resource_type="recycle_rowsets"}
 4.76923 |
+| g_bvar_recycler_instance_recycle_bytes_per_ms | 
recycler_instance_recycle_bytes_per_ms | instance_id, resource_type | Records 
recycling speed by instance ID and object type (bytes recycled per millisecond, 
-1 means no recycling) | 
recycler_instance_recycle_bytes_per_ms{instance_id="default_instance_id",resource_type="recycle_rowsets"}
 217.887 |
+| g_bvar_recycler_instance_recycle_total_num_since_started | 
recycler_instance_recycle_total_num_since_started | instance_id, resource_type 
| Counts total recycled quantity since recycler started by instance ID and 
object type | 
recycler_instance_recycle_total_num_since_started{instance_id="default_instance_id",resource_type="recycle_rowsets"}
 49 |
+| g_bvar_recycler_instance_recycle_total_bytes_since_started | 
recycler_instance_recycle_total_bytes_since_started | instance_id, 
resource_type | Counts total recycled size (bytes) since recycler started by 
instance ID and object type | 
recycler_instance_recycle_total_bytes_since_started{instance_id="default_instance_id",resource_type="recycle_rowsets"}
 40785 |
+| g_bvar_recycler_instance_running_counter | recycler_instance_running_counter 
| - | Counts how many instances are currently recycling | 
recycler_instance_running_counter 0 |
+| g_bvar_recycler_instance_last_recycle_duration | 
recycler_instance_last_round_recycle_duration | instance_id | Records total 
duration of the last recycling round by instance ID | 
recycler_instance_last_recycle_duration{instance_id="default_instance_id"} 64 |
+| g_bvar_recycler_instance_next_ts | recycler_instance_next_ts | instance_id | 
Estimates next recycle time based on config's recycle_interval_seconds by 
instance ID | recycler_instance_next_ts{instance_id="default_instance_id"} 
1750400266781 |
+| g_bvar_recycler_instance_recycle_st_ts | recycler_instance_recycle_start_ts 
| instance_id | Records start time of total recycling process by instance ID | 
recycler_instance_recycle_st_ts{instance_id="default_instance_id"} 
1750400236717 |
+| g_bvar_recycler_instance_recycle_ed_ts | recycler_instance_recycle_end_ts | 
instance_id | Records end time of total recycling process by instance ID | 
recycler_instance_recycle_ed_ts{instance_id="default_instance_id"} 
1750400236781 |
+| g_bvar_recycler_instance_recycle_last_success_ts | 
recycler_instance_recycle_last_success_ts | instance_id | Records last 
successful recycling time by instance ID | 
recycler_instance_recycle_last_success_ts{instance_id="default_instance_id"} 
1750400236781 |
+
+## 5. Parameter Tuning
+
+Common recycler parameters and their descriptions:
+
+```
+// Recycler interval in seconds
+CONF_mInt64(recycle_interval_seconds, "3600");
+
+// Common retention time, applies to all objects without their own retention 
time
+CONF_mInt64(retention_seconds, "259200");
+
+// Maximum number of instances a recycler can recycle simultaneously
+CONF_Int32(recycle_concurrency, "16");
+
+// Retention time for compacted rowsets in seconds
+CONF_mInt64(compacted_rowset_retention_seconds, "1800");
+
+// Retention time for dropped indexes in seconds
+CONF_mInt64(dropped_index_retention_seconds, "10800");
+
+// Retention time for dropped partitions in seconds
+CONF_mInt64(dropped_partition_retention_seconds, "10800");
+
+// Recycle whitelist, specify instance IDs separated by commas, defaults to 
recycling all instances if empty
+CONF_Strings(recycle_whitelist, "");
+
+// Recycle blacklist, specify instance IDs separated by commas, defaults to 
recycling all instances if empty
+CONF_Strings(recycle_blacklist, "");
+
+// Object IO worker concurrency: e.g., object list, delete
+CONF_mInt32(instance_recycler_worker_pool_size, "32");
+
+// Recycle object concurrency: e.g., recycle_tablet, recycle_rowset
+CONF_Int32(recycle_pool_parallelism, "40");
+
+// Whether to enable checker
+CONF_Bool(enable_checker, "false");
+
+// Whether to enable reverse checker
+CONF_Bool(enable_inverted_check, "false");
+
+// Checker interval
+CONF_mInt32(check_object_interval_seconds, "43200");
+
+// Whether to enable recycler observation metrics
+CONF_Bool(enable_recycler_stats_metrics, "false");
+
+// Recycle storage backend whitelist, specify vault names separated by commas, 
defaults to recycling all vaults if empty
+CONF_Strings(recycler_storage_vault_white_list, "");
+```
+
+### Common Tuning Scenarios Q&A
+
+#### 1. Recycling Performance Tuning
+
+**Q1: What to do if recycling speed is too slow?**
+
+A1: You can tune from the following aspects:
+- Increase concurrency:
+  - Increase recycle_concurrency (default 16): increase the number of 
instances recycled simultaneously
+  - Increase instance_recycler_worker_pool_size (default 32): increase object 
IO operation concurrency
+  - Increase recycle_pool_parallelism (default 40): increase recycle object 
concurrency
+- Shorten recycling interval: Reduce recycle_interval_seconds from default 
3600 seconds, e.g., to 1800 seconds
+- Use whitelist mechanism: Prioritize recycling important instances through 
recycle_whitelist
+
+**Q2: How to adjust when recycling pressure is too high and affects business?**
+
+A2: You can adopt the following strategies to reduce recycling pressure:
+- Reduce concurrency:
+  - Appropriately reduce recycle_concurrency to avoid recycling too many 
instances simultaneously
+  - Reduce instance_recycler_worker_pool_size and recycle_pool_parallelism
+- Extend recycling interval: Increase recycle_interval_seconds, e.g., adjust 
to 7200 seconds
+- Use blacklist: Temporarily exclude high-load instances through 
recycle_blacklist
+- Off-peak recycling: Perform recycling operations during business off-peak 
hours
+
+#### 2. Storage Space Tuning
+
+**Q3: What to do when storage space is insufficient and garbage cleanup needs 
to be accelerated?**
+
+A3: You can adjust retention times for various objects:
+- Shorten general retention time: Reduce retention_seconds from default 259200 
seconds (3 days)
+- Targeted adjustment for specific objects:
+  - compacted_rowset_retention_seconds (default 1800 seconds) can be 
appropriately shortened
+  - dropped_index_retention_seconds and dropped_partition_retention_seconds 
(default 10800 seconds) can be adjusted as needed
+- Selective storage backend recycling: Prioritize cleaning specific storage 
through recycler_storage_vault_white_list
+
+**Q4: What to do when longer data retention is needed to prevent accidental 
deletion?**
+
+A4: Extend corresponding retention times:
+- Increase retention_seconds to a longer period, e.g., 604800 seconds
+- Adjust corresponding retention parameters based on different object 
importance
+- Important partitions can set longer retention times through 
dropped_partition_retention_seconds
+
+#### 3. Monitoring and Troubleshooting Tuning
+
+**Q5: How to enable better monitoring and troubleshooting capabilities?**
+
+A5: It's recommended to enable the following monitoring features:
+- Enable observation metrics: Set enable_recycler_stats_metrics = true
+- Enable check mechanisms:
+  - Set enable_checker = true to enable forward checking
+  - Set enable_inverted_check = true to enable reverse checking
+  - Adjust check_object_interval_seconds (default 43200 seconds/12 hours) to 
appropriate check frequency
+
+**Q6: How to troubleshoot suspected data consistency issues?**
+
+A6: Utilize the checker mechanism for inspection:
+- Ensure both enable_checker and enable_inverted_check are true
+- Appropriately shorten check_object_interval_seconds to increase check 
frequency
+- Observe anomalies discovered by checker through MS panel
+- Manually handle excess garbage files or supplement accidentally deleted 
files based on checker reports
+
+#### 4. Special Scenario Tuning
+
+**Q7: How to temporarily handle abnormal instance recycling?**
+
+A7: Use whitelist and blacklist mechanisms:
+- Temporarily skip problematic instances: Add abnormal instance IDs to 
recycle_blacklist
+- Prioritize specific instances: Add instance IDs needing priority processing 
to recycle_whitelist
+- Storage backend selection: Selectively recycle specific storage backends 
through recycler_storage_vault_white_list
+
+**Q8: What to do when large table deletion causes recycling task backlog?**
+
+A8: Comprehensive tuning strategy:
+- Temporarily increase concurrency parameters to handle backlog
+- Appropriately shorten retention time for large objects
+- Use whitelist to prioritize instances with severe backlog
+- Deploy multiple recyclers to share the load if necessary
+
+**Q9: What to do when encountering "404 file not found" errors in object 
storage during long queries?**
+
+A9: When query execution time is very long and tablets undergo compaction 
during the query, merged rowsets on object storage may have been recycled, 
causing query failure with "404 file not found" errors. Solution:
+- Increase compacted rowset retention time: Increase 
compacted_rowset_retention_seconds from default 1800 seconds, e.g.:
+  - For scenarios with long queries, recommend adjusting to 7200 seconds (or 
longer)
+  - Set appropriate retention time based on maximum query time
+
+This ensures that rowsets needed during long query execution are not 
prematurely recycled, avoiding query failures.
+
+---
+
+**Note**: The above tuning suggestions need to be specifically adjusted based 
on actual cluster scale, storage capacity, business characteristics, and other 
factors. It's recommended to closely monitor system load and business impact 
during the tuning process, gradually adjusting parameters to find the optimal 
configuration.
+
+## Conclusion
+
+The mark-for-deletion mechanism under Apache Doris's storage-compute 
separation architecture, through cleverly balancing performance, security, and 
resource utilization, not only solves the inherent defects of traditional data 
recycling methods but also provides users with a complete, reliable, and 
observable data management solution.
+
+From fine-grained hierarchical recycling design to intelligent expiration 
protection mechanisms, from comprehensive multiple check systems to rich 
observability metrics, Doris's data recycling mechanism reflects deep 
understanding of user needs and relentless pursuit of technical quality in 
every detail. Particularly, its flexible parameter tuning capabilities enable 
users of different scales and scenarios to find the most suitable configuration 
solutions.
+
+In the future, we will continue to optimize and improve this mechanism, 
maintaining existing advantages while further improving recycling efficiency, 
enhancing intelligence levels, and enriching monitoring dimensions, building a 
more efficient and reliable real-time data analysis platform for users. We 
welcome users to explore more possibilities in practice and work with us to 
continuously advance Apache Doris forward.
\ No newline at end of file
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/recycler.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/recycler.md
new file mode 100644
index 00000000000..3ee4fbc81ed
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/recycler.md
@@ -0,0 +1,285 @@
+---
+{
+    "title": "数据回收",
+    "language": "zh-CN"
+}
+---
+
+# Doris 存算分离数据回收
+
+## 引言
+
+在大数据时代，数据生命周期管理已成为分布式数据库系统的核心挑战之一。随着业务数据量的爆炸式增长，如何在保证数据安全的前提下实现高效的存储空间回收，成为每个数据库产品必须解决的关键问题。
+
+Apache Doris 
作为新一代实时分析型数据库，在存算分离架构下采用了标记删除（Mark-for-Deletion）的数据回收策略，并在此基础上进行了深度优化和增强。通过引入精细化的分层回收机制、灵活可配的过期保护、多重数据一致性检查以及完善的可观测性体系，同时充分考虑分布式环境的复杂性，设计了独立的
 Recycler 组件、智能的并发控制、完备的监控指标等，为用户提供了一个既高效又可控的企业级数据生命周期管理方案，实现了性能、安全性和可控性的最佳平衡。
+
+本文将深入剖析 Doris 存算分离架构下的数据回收机制，从设计理念到技术实现，从核心原理到实践调优，全面展示这一成熟解决方案的技术细节与应用价值。
+
+## 1. 常规数据回收策略对比
+
+### 1.1 同步删除
+
+最直接的删除方式。当数据被删除（例如drop 
table）时，立即将相关的meta以及对应文件删除，数据一旦删除就无法恢复，操作简单直接，但删除速度较慢，风险较高。
+
+### 1.2 对账删除(反向）
+
+这种方式通过定期对账机制来确定哪些数据可以删除。当数据被删除（例如drop 
table）时，仅仅删除meta数据，系统会定期进行数据对账，扫描文件数据，识别出不再被meta引用或已失效的数据，然后批量删除。
+
+### 1.3 标记删除(正向)
+
+这种方式通过定期扫描已删除的meta数据来确定哪些数据可以删除。当数据被删除（例如drop 
table）时，不直接删除数据，而是将要删除的meta标记为已删除，系统会定期扫描被标记的meta数据，找到对应的文件进行批量删除。
+
+## 2. Doris 存算分离 标记删除 的好处
+
+Doris 存算分离架构选择了标记删除方法，这一选择能够有效保证数据一致性，同时在性能、安全性和资源利用率之间达到最佳平衡。
+
+以 drop table 为例，标记删除相比其他两种方式有以下显著优势：
+
+### 2.1 性能优势
+
+- **响应速度快**：drop table 操作只需要标记 meta kv数据为删除状态，无需等待文件 I/O 
操作完成，用户可以立即得到响应。这在大表删除场景下尤为重要，避免了长时间阻塞。
+- **批量处理效率高**：定期扫描删除标记的 meta kv，可以批量处理文件删除操作，减少系统调用次数，提高整体 I/O 效率。
+
+### 2.2 安全性优势
+
+- **误操作保护**：标记删除提供了一个缓冲期，在实际文件删除前可以恢复误删的表，显著降低了人为操作风险。
+- **事务安全性**：标记操作是轻量级的 meta 修改，更容易保证原子性，减少了删除过程中系统故障导致的数据不一致问题。
+
+### 2.3 资源管理优势
+
+- **系统负载均衡**：文件删除操作可以在系统空闲时间进行，避免在业务高峰期占用大量 I/O 资源影响正常业务。
+- **可控的删除节奏**：可以根据系统负载动态调整删除速度，避免大量删除操作对系统造成冲击。
+
+### 2.4 对比其他方案
+
+- **相比同步删除**：避免了删除大表时的长时间等待，提升用户体验，此外，还提供了一定的删除缓冲期，能够保证安全性，一定程度上防止人为操作事故。
+- **相比对账删除**：只扫描标记删除的 
meta，扫描数据更加明确，减少没有必要的I/O操作，效率更高，不需要遍历所有文件来判断是否被引用，删除更快速，效率更高。
+
+## 3. Doris数据回收的原理
+
+recycler是一个单独部署的组件，负责周期性对过期的垃圾文件进行回收，一个recycler可以同时回收多个instance，并且一个instance同一时间只能被一个recycler回收。
+
+### 3.1 标记删除
+
+每当一个执行一个drop命令或者系统有垃圾数据（例如compacted rowset）产生时，对应的meta 
kv会被标记为recycled，recycler会定期对instance中的recycle kv进行扫描，删除对应的对象文件，后面再将recycle 
kv删除，确保删除顺序的安全性。
+
+### 3.2 分层结构
+
+在recycler对instance数据进行回收时，多个任务会并发进行，例如recycle_indexes，recycle_partition，recycle_compacted_rowsets，recycle_txn等等任务。
+
+数据在回收过程中按照分层结构进行删除：删除table是会删除对应的partitions，删除partition时会删除对应tablets，删除tablet的时候又会删除tablet对应的rowsets，删除rowset会删除对应的segment文件，最终的执行对象是doris的最小文件单位即segment文件。
+
+以drop table为例子，回收过程中，系统会首先删除segment对象文件，成功后删除recycle rowset 
kv，tablet的rowset全部删除成功后会删除recycle tablet kv，以此类推最终删除table中所有的对象文件以及recycle kv。
+
+### 3.3 过期机制
+
+每个需要回收的对象都在其kv中记录有对应的过期时间，系统通过扫描各种recycle 
kv并且计算过期时间来识别要删除的对象，如果出现了用户误操作将某个table 
drop，这时由于过期机制的存在，recycler不会立刻对其数据进行删除，而是会等待一个retition time，这为用户恢复数据提供了可能。
+
+### 3.4 可靠性保证
+
+1. **分阶段删除**：先删除数据文件，再删除元数据，最后删除索引或分区的key，确保删除顺序的安全性。
+
+2. 
**Lease保护机制**：每个recycler在开始回收前都要获取lease，启动后台线程定期续lease，只有lease过期或状态为IDLE时，新的recycler才能接管，保证了同一时间一个instance只能由一个recycler回收，避免并发回收导致的数据不一致问题。
+
+### 3.5 多重检查机制
+
+Recycler 实现了 FE 元数据、MS kv与对象文件的多重相互检查机制（checker）。checker 在后台对所有的 Recycler 
kv、对象文件、FE 内存元数据三方进行正反向检查。
+
+以 segment 文件 KV 与对象文件检查为例：
+- 正向检查：扫描所有 kv，检查是否都有对应的 segment 文件存在，以及 FE 内存中是否存在相应的 segment 信息。
+- 反向检查：扫描所有 segment 文件，验证是否都有对应的 kv，以及 FE 内存中是否存在相应的 segment 信息。
+
+多重检查机制能够保证 recycler 删除数据的正确性。如果在某种情况下出现未回收或多回收的情况，checker 会捕获相关信息，运维人员可以根据 
checker 的信息手动删除多余垃圾文件，也可以依靠对象的多版本来恢复误删的文件，提供了有效的兜底机制。
+
+当前已实现了 segment 文件、idx 文件、delete bitmap 元数据等的正反向检查，后续将实现所有元数据的检查，进一步保证 recycler 
的正确性与可靠性。
+
+## 4. 观测机制
+
+recycler回收效率进度是用户非常关心的问题，因此我们大大提高了recycler的可观测性，添加了大量可视化监控指标以及必要的日志，可视化指标能够让用户直观的看到回收的进度，效率，异常等基础信息，我们也提供了更多指标可以让用户看到更加详细的信息，例如估算下一次某个instance做
 recycle 的时间；添加的日志也可以让运维及研发更快的定位问题。
+
+### 4.1 解答用户关心的问题
+
+**基础问题：**
+- 仓库粒度的回收速度：每秒回收多少字节，各类对象每秒回收数量
+- 仓库粒度每次回收的数据量和耗时
+- 仓库粒度的回收进度：已回收数据量，待回收数据量
+
+**高级问题：**
+- 每个存储后端的回收情况
+- Recycler 回收成功时间、失败时间
+- 下一次 Recycler 的预计回收时间
+
+这些信息都可以通过 MS 面板进行实时观测。
+
+### 4.2 观测指标
+
+| 变量名 | Metrics name | 维度/标签 | 含义 | 例子 |
+|--------|--------------|-----------|------|------|
+| g_bvar_recycler_vault_recycle_status | recycler_vault_recycle_status | 
instance_id, resource_id, status | 按实例ID、资源ID和状态记录回收存储库操作的状态计数 | 
recycler_vault_recycle_status{instance_id="default_instance_id",resource_id="1",status="normal"}
 8 |
+| g_bvar_recycler_vault_recycle_task_concurrency | 
recycler_vault_recycle_task_concurrency | instance_id, resource_id | 
按实例ID和资源ID统计 vault 回收文件任务的并发数 | 
recycler_vault_recycle_task_concurrency{instance_id="default_instance_id",resource_id="1"}
 2 |
+| g_bvar_recycler_instance_last_round_recycled_num | 
recycler_instance_last_round_recycled_num | instance_id, resource_type | 
按实例ID和对象类型统计最近一轮已回收的对象数量 | 
recycler_instance_last_round_recycled_num{instance_id="default_instance_id",resource_type="recycle_rowsets"}
 13 |
+| g_bvar_recycler_instance_last_round_to_recycle_num | 
recycler_instance_last_round_to_recycle_num | instance_id, resource_type | 
按实例ID和对象类型统计最近一轮需要回收的对象数量 | 
recycler_instance_last_round_to_recycle_num{instance_id="default_instance_id",resource_type="recycle_rowsets"}
 13 |
+| g_bvar_recycler_instance_last_round_recycled_bytes | 
recycler_instance_last_round_recycled_bytes | instance_id, resource_type | 
按实例ID和对象类型统计最近一轮已回收的数据大小（bytes） | 
recycler_instance_last_round_recycled_bytes{instance_id="default_instance_id",resource_type="recycle_rowsets"}
 13509 |
+| g_bvar_recycler_instance_last_round_to_recycle_bytes | 
recycler_instance_last_round_to_recycle_bytes | instance_id, resource_type | 
按实例ID和对象类型统计最近一轮需要回收的数据大小（bytes） | 
recycler_instance_last_round_to_recycle_bytes{instance_id="default_instance_id",resource_type="recycle_rowsets"}
 13509 |
+| g_bvar_recycler_instance_last_round_recycle_elpased_ts | 
recycler_instance_last_round_recycle_elpased_ts | instance_id, resource_type | 
按实例ID和对象类型统计最近一轮上最近一轮回收操作的耗时 (ms) | 
recycler_instance_last_round_recycle_elpased_ts{instance_id="default_instance_id",resource_type="recycle_rowsets"}
 62 |
+| g_bvar_recycler_instance_recycle_round | recycler_instance_recycle_round | 
instance_id, resource_type | 按实例ID和对象类型统计回收操作的轮次 | 
recycler_instance_recycle_round{instance_id="default_instance_id_2",object_type="recycle_rowsets"}
 2 |
+| g_bvar_recycler_instance_recycle_time_per_resource | 
recycler_instance_recycle_time_per_resource | instance_id, resource_type | 
按实例ID和对象类型记录回收操作的速度 (代表每个资源回收需要的时间(ms)，如果为 -1 代表没有回收) | 
recycler_instance_recycle_time_per_resource{instance_id="default_instance_id",resource_type="recycle_rowsets"}
 4.76923 |
+| g_bvar_recycler_instance_recycle_bytes_per_ms | 
recycler_instance_recycle_bytes_per_ms | instance_id, resource_type | 
按实例ID和对象类型记录回收操作的速度 (代表每毫秒能回收的 bytes，如果为 -1 代表没有回收) | 
recycler_instance_recycle_bytes_per_ms{instance_id="default_instance_id",resource_type="recycle_rowsets"}
 217.887 |
+| g_bvar_recycler_instance_recycle_total_num_since_started | 
recycler_instance_recycle_total_num_since_started | instance_id, resource_type 
| 按实例ID和对象类型，从recycler 启动以来统计回收操作的总回收数量 | 
recycler_instance_recycle_total_num_since_started{instance_id="default_instance_id",resource_type="recycle_rowsets"}
 49 |
+| g_bvar_recycler_instance_recycle_total_bytes_since_started | 
recycler_instance_recycle_total_bytes_since_started | instance_id, 
resource_type | 按实例ID和对象类型，从recycler 启动以来统计回收操作的总回收大小 (bytes) | 
recycler_instance_recycle_total_bytes_since_started{instance_id="default_instance_id",resource_type="recycle_rowsets"}
 40785 |
+| g_bvar_recycler_instance_running_counter | recycler_instance_running_counter 
| - | 统计现在有多少个 instance 在做 recycle | recycler_instance_running_counter 0 |
+| g_bvar_recycler_instance_last_recycle_duration | 
recycler_instance_last_round_recycle_duration | instance_id | 
按实例ID，统计最近一轮回收的总用时 | 
recycler_instance_last_recycle_duration{instance_id="default_instance_id"} 64 |
+| g_bvar_recycler_instance_next_ts | recycler_instance_next_ts | instance_id | 
按实例ID，根据 config 的 recycle_interval_seconds 估算下一次做 recycle 的时间 | 
recycler_instance_next_ts{instance_id="default_instance_id"} 1750400266781 |
+| g_bvar_recycler_instance_recycle_st_ts | recycler_instance_recycle_start_ts 
| instance_id | 按实例ID，统计总回收流程的开始时间 | 
recycler_instance_recycle_st_ts{instance_id="default_instance_id"} 
1750400236717 |
+| g_bvar_recycler_instance_recycle_ed_ts | recycler_instance_recycle_end_ts | 
instance_id | 按实例ID，统计总回收流程的结束时间 | 
recycler_instance_recycle_ed_ts{instance_id="default_instance_id"} 
1750400236781 |
+| g_bvar_recycler_instance_recycle_last_success_ts | 
recycler_instance_recycle_last_success_ts | instance_id | 按实例ID，统计上一次回收成功的时间 | 
recycler_instance_recycle_last_success_ts{instance_id="default_instance_id"} 
1750400236781 |
+
+## 5. 参数调优
+
+recycler的常见参数以及说明如下：
+
+```
+// recycler回收间隔，单位秒
+CONF_mInt64(recycle_interval_seconds, "3600");
+
+// 公共的留存时间，适用于所有没有自己retition time的对象的回收
+CONF_mInt64(retention_seconds, "259200");
+
+// 一个recycler同时可以回收的instance的最大数量
+CONF_Int32(recycle_concurrency, "16");
+
+// 被compacted的rowset的留存时间，单位秒
+CONF_mInt64(compacted_rowset_retention_seconds, "1800");
+
+// 被drop的index的留存时间，单位秒
+CONF_mInt64(dropped_index_retention_seconds, "10800");
+
+// 被drop的partition的留存时间，单位秒
+CONF_mInt64(dropped_partition_retention_seconds, "10800");
+
+// recycle的白名单，填写instance id，用逗号隔开，不填写默认回收所有instance
+CONF_Strings(recycle_whitelist, "");
+
+// recycle的黑名单，填写instance id，用逗号隔开，不填写默认回收所有instance
+CONF_Strings(recycle_blacklist, "");
+
+// 对象IO worker的并发度: 例如object list, delete
+CONF_mInt32(instance_recycler_worker_pool_size, "32");
+
+// recycle对象的并发度：例如recycle_tablet，recycle_rowset
+CONF_Int32(recycle_pool_parallelism, "40");
+
+// 是否开启checker
+CONF_Bool(enable_checker, "false");
+
+// 是否开启反向checker
+CONF_Bool(enable_inverted_check, "false");
+
+// checker的间隔
+CONF_mInt32(check_object_interval_seconds, "43200");
+
+// 是否开启recycler的观测指标
+CONF_Bool(enable_recycler_stats_metrics, "false");
+
+// recycle存储后端的白名单，填写vault name，用逗号隔开，不填写默认回收所有vault
+CONF_Strings(recycler_storage_vault_white_list, "");
+```
+
+### 常见调优场景 Q&A
+
+#### 1. 回收性能调优
+
+**Q1: 回收速度太慢怎么办？**
+
+A1: 可以从以下几个方面进行调优：
+- 增加并发度：
+  - 调大 recycle_concurrency（默认16）：增加同时回收的instance数量
+  - 调大 instance_recycler_worker_pool_size（默认32）：增加对象IO操作的并发度
+  - 调大 recycle_pool_parallelism（默认40）：增加回收对象的并发度
+- 缩短回收间隔：将 recycle_interval_seconds 从默认3600秒调小，如1800秒
+- 使用白名单机制：通过 recycle_whitelist 优先回收重要的instance
+
+**Q2: 回收压力过大，影响业务怎么调整？**
+
+A2: 可以采用以下策略降低回收压力：
+- 降低并发度：
+  - 适当减小 recycle_concurrency，避免同时回收过多instance
+  - 减小 instance_recycler_worker_pool_size 和 recycle_pool_parallelism
+- 延长回收间隔：增大 recycle_interval_seconds，如调整为7200秒
+- 使用黑名单：通过 recycle_blacklist 暂时排除高负载的instance
+- 错峰回收：在业务低峰期进行回收操作
+
+#### 2. 存储空间调优
+
+**Q3: 存储空间不足，需要加快垃圾清理怎么办？**
+
+A3: 可以调整各类对象的留存时间：
+- 缩短通用留存时间：将 retention_seconds 从默认259200秒（3天）调小
+- 针对性调整特定对象：
+  - compacted_rowset_retention_seconds（默认1800秒）可适当缩短
+  - dropped_index_retention_seconds 和 
dropped_partition_retention_seconds（默认10800秒）可根据需求调整
+- 选择性回收存储后端：通过 recycler_storage_vault_white_list 优先清理特定存储
+
+**Q4: 需要保留更长时间的数据以防误删怎么办？**
+
+A4: 延长相应的留存时间：
+- 增大 retention_seconds 为更长时间，如604800秒
+- 根据不同对象的重要性调整对应的retention参数
+- 重要的partition可以通过 dropped_partition_retention_seconds 设置更长留存时间
+
+#### 3. 监控与排查调优
+
+**Q5: 如何开启更好的监控和排查能力？**
+
+A5: 建议开启以下监控功能：
+- 开启观测指标：设置 enable_recycler_stats_metrics = true
+- 开启检查机制：
+  - 设置 enable_checker = true 开启正向检查
+  - 设置 enable_inverted_check = true 开启反向检查
+  - 调整 check_object_interval_seconds（默认43200秒/12小时）为合适的检查频率
+
+**Q6: 怀疑数据一致性问题怎么排查？**
+
+A6: 利用checker机制进行检查：
+- 确保 enable_checker 和 enable_inverted_check 都为true
+- 适当缩短 check_object_interval_seconds 增加检查频率
+- 通过MS面板观察checker发现的异常情况
+- 根据checker报告手动处理多余的垃圾文件或补充误删文件
+
+#### 4. 特殊场景调优
+
+**Q7: 某些instance回收异常，如何临时处理？**
+
+A7: 使用白名单和黑名单机制：
+- 临时跳过问题instance：将异常instance ID加入 recycle_blacklist
+- 优先处理特定instance：将需要优先处理的instance ID加入 recycle_whitelist
+- 存储后端选择：通过 recycler_storage_vault_white_list 选择性回收特定存储后端
+
+**Q8: 大表删除导致回收任务堆积怎么办？**
+
+A8: 综合调优策略：
+- 临时增大并发度参数应对积压
+- 适当缩短大对象的retention时间
+- 使用白名单优先处理积压严重的instance
+- 必要时可以部署多个recycler分担压力
+
+**Q9: 长时间查询遇到对象存储"404 file not found"错误怎么办？**
+
+A9: 当查询执行时间很长，而查询期间tablet进行了compaction操作，对象存储上被合并的rowset可能已经被回收，导致查询失败并出现"404 
file not found"错误。解决方案：
+- 增加compacted rowset留存时间：将 compacted_rowset_retention_seconds 从默认1800秒调大，如：
+  - 对于有长查询的场景，建议调整为7200秒（或更长）
+  - 根据最长查询时间来设定合适的留存时间
+
+这样可以确保长查询在执行过程中所需的rowset不会被提前回收，避免查询失败。
+
+---
+
+**注意**：以上调优建议需要根据实际的集群规模、存储容量、业务特点等因素进行具体调整。建议在调优过程中密切关注系统负载和业务影响，逐步调整参数以找到最佳配置。
+
+## 结语
+
+Apache Doris 存算分离架构下的标记删除机制，通过巧妙平衡性能、安全性和资源利用率，Doris 
不仅解决了传统数据回收方式的固有缺陷，更为用户提供了一套完整、可靠、可观测的数据管理解决方案。
+
+从精细化的分层回收设计，到智能的过期保护机制，从完善的多重检查体系，到丰富的可观测性指标，Doris 
的数据回收机制在每一个细节上都体现了对用户需求的深入理解和对技术品质的不懈追求。特别是其提供的灵活参数调优能力，使得不同规模、不同场景的用户都能找到最适合自己的配置方案。
+
+未来，我们将继续优化和完善这一机制，在保持现有优势的基础上，进一步提升回收效率、增强智能化水平、丰富监控维度，为用户构建更加高效、可靠的实时数据分析平台。欢迎广大用户在实践中探索更多可能，与我们一起推动
 Apache Doris 不断向前发展。
\ No newline at end of file
diff --git a/sidebars.json b/sidebars.json
index cc1b03c797a..193625bd5a2 100644
--- a/sidebars.json
+++ b/sidebars.json
@@ -506,6 +506,7 @@
                         "compute-storage-decoupled/managing-compute-cluster",
                         "compute-storage-decoupled/file-cache",
                         "compute-storage-decoupled/read-write-splitting",
+                        "compute-storage-decoupled/recycler",
                         "compute-storage-decoupled/upgrade"
                     ]
                 },


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris-website) branch master updated: Add recycler doc (#2777)

Reply via email to