ChenSammi commented on code in PR #10412:
URL: https://github.com/apache/ozone/pull/10412#discussion_r3457453412


##########
hadoop-hdds/docs/content/design/lifecycle-task-resume.md:
##########
@@ -0,0 +1,80 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements. See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License. You may obtain a copy of the License at
+*
+*      http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+  */
+
+# Design for Resumable Lifecycle Scans(HDDS-8342)
+
+## Problem Statement:
+
+The `HDDS-8342` branch introduces the `KeyLifecycleService`, a background 
service running on the Ozone Manager (OM) Leader to enforce bucket lifecycle 
rules (expiration, moving to trash, and aborting incomplete multipart uploads).
+The entire bucket is scanned in a single `call()` execution. If the OM 
restarts, crashes, or a leader transfer occurs, the scan state is lost. The new 
leader must restart the scan from the beginning. 
+For buckets with billions of keys, the scan may never complete if leader 
transfers happen periodically.
+
+## Design: Persisting Bucket Scan Pointers
+
+To solve the resumability issues, we need to persist the scan progress (the 
"pointer") to the OM DB. This ensures that a new OM leader can resume from 
where the previous leader left off.
+
+### 2.1 Data Structure for the Scan Pointer
+
+Define a new Protobuf message `LifecycleScanState` to capture the exact 
position of the scan.
+
+```protobuf
+message LifecycleScanState {
+    optional string bucketKey = 1;  // e.g., /volume/bucket
+    optional uint64 bucketObjID = 2;  // bucket's object ID, in case the 
bucket is deleted and recreated with same name
+    optional uint64 lifecycleConfigurationUpdateID = 3;  // lifecycle 
configuration update ID, in case the bucket is updated with new rules
+    optional uint64 scanStartTime = 4;  // Epoch time when this full scan 
started
+    optional uint64 scanEndTime = 5;  // Epoch time when this full scan is 
completed
+    optional string lastScannedKey = 6;  // the last scanned key in the 
bucket(for both OBS and FSO)
+    optional string lastScannedDir = 7;  // the last scanned dir path, e.g 
/dir1/dir2/dir3
+    optional string lastScannedDirKey = 8; // the last scanned dir key in 
directoryTable, e.g /0/1/3/dir3
+    optional string lastScannedMpuKey = 9;
+}
+```
+
+### OM DB Schema Updates
+Add a new table `lifecycleStateTable` to `OMMetadataManager` to store the scan 
states:
+- **Table Name:** `lifecycleStateTable`
+- **Key:** `bucketKey` (String, e.g., `/volumeName/bucketName`)
+- **Value:** `LifecycleScanState`
+
+### When to Persist the Pointer
+Persisting the pointer for every key would overwhelm Ratis and RocksDB. We 
should checkpoint periodically:
+
+1. **Piggybacking on Deletes:** Add an optional `LifecycleScanState` field to 
`DeleteKeysRequest`. When the OM state machine applies the deletion, it 
atomically updates the `lifecycleStateTable` with the new pointer. This 
guarantees exactly-once semantics for the scan pointer relative to deletions.

Review Comment:
   Piggybacking can reduce one ratis call, save OM resource. I would like to 
keep it.  For move to trash, I prefer piggyback too but there is no batch 
rename command supported now. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to