ChenSammi commented on code in PR #9979:
URL: https://github.com/apache/ozone/pull/9979#discussion_r3000015379


##########
hadoop-hdds/docs/content/feature/Lifecycle.md:
##########
@@ -0,0 +1,396 @@
+---
+title: "Object Lifecycle Management"
+weight: 1
+menu:
+   main:
+      parent: Features
+summary: S3-compatible object lifecycle management with automatic object 
expiration
+---
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+## Background
+
+In object storage scenarios, large amounts of data become obsolete over time 
and no longer need to be accessed or retained. Manually cleaning up expired 
data is both time-consuming and error-prone. Object lifecycle management 
provides an automated approach that allows administrators to configure policies 
at the bucket level so the system can automatically handle the cleanup of 
expired objects.
+
+Ozone's object lifecycle management is designed after [AWS S3 Lifecycle 
Configuration](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html)
 and provides compatible API interfaces through the S3 Gateway. The current 
version implements the Expiration action, which automatically deletes or moves 
objects to trash based on the object's last modification time.
+
+## Compatibility with AWS S3 Lifecycle
+
+Ozone's lifecycle management is designed with AWS S3 Lifecycle as a reference. 
The current version does not implement all S3 Lifecycle features. Below is a 
detailed compatibility comparison.
+
+### API Compatibility
+
+| S3 API | Supported | Description |
+|--------|-----------|-------------|
+| `PutBucketLifecycleConfiguration` | Yes | Set via S3 Gateway `PUT 
/{bucket}?lifecycle` |
+| `GetBucketLifecycleConfiguration` | Yes | Get via S3 Gateway `GET 
/{bucket}?lifecycle` |
+| `DeleteBucketLifecycle` | Yes | Delete via S3 Gateway `DELETE 
/{bucket}?lifecycle` |
+
+### Lifecycle Actions
+
+| S3 Lifecycle Action | Supported | Description |
+|---------------------|-----------|-------------|
+| Expiration | Yes | Supports both `Days` and `Date` modes |
+| Transition | No | Ozone does not currently support tiered storage class 
transitions similar to S3 |
+| NoncurrentVersionExpiration | No | Ozone's bucket versioning mechanism 
differs from S3 |
+| NoncurrentVersionTransition | No | Same as above |
+| AbortIncompleteMultipartUpload [1] | No | Automatic cleanup of incomplete 
multipart uploads is not implemented |
+| ExpiredObjectDeleteMarker | No | Ozone does not use S3-style delete markers |
+
+[1] Ozone has a separate cleanup service for incomplete multipart uploads 
(MultipartUploadCleanupService)
+
+### Filter Conditions
+
+| S3 Filter Element | Supported | Description |
+|--------------------|-----------|-------------|
+| Prefix | Yes | Supports both top-level Prefix and Prefix within Filter |
+| Tag | Yes | Supports filtering by a single tag |
+| And (Prefix + Tags) | Yes | Supports combining Prefix with multiple Tag 
conditions |
+| ObjectSizeGreaterThan | No | Filtering by minimum object size is not 
supported |
+| ObjectSizeLessThan | No | Filtering by maximum object size is not supported |
+
+### Other Differences
+
+- Ozone-specific feature: Ozone supports moving expired objects to trash 
(`.Trash`) instead of deleting them directly.
+- Bucket Layout: Ozone's FSO (FILE_SYSTEM_OPTIMIZED) buckets support recursive 
directory-tree-based evaluation and can automatically expire empty directories.
+- Administrative operations: Ozone provides `suspend` / `resume` commands to 
dynamically control the lifecycle service (S3 achieves similar effects through 
disabling rule `Status`, which Ozone also supports), allowing you to stop all 
lifecycle processing directly.
+
+## Lifecycle Configuration
+
+The overall configuration rules of Ozone lifecycle are essentially the same as 
AWS S3 Lifecycle semantics. Refer to AWS S3 Lifecycle documentation for more 
details on the rules.
+
+### Overall Structure
+
+A Lifecycle Configuration is bound to a bucket. Each bucket can have at most 
one lifecycle configuration, and each configuration can contain up to 1000 
rules.
+
+Each rule contains the following elements:
+
+| Element | Description |
+|---------|-------------|
+| ID | Unique identifier for the rule, up to 255 characters. Auto-generated if 
not specified. |
+| Status | `Enabled` or `Disabled`. Only enabled rules are executed. |
+| Filter / Prefix | Specifies the scope of the rule. Can filter by object name 
prefix. |
+| Expiration | Expiration action. Specifies when objects expire via `Days` or 
`Date`. |
+
+### Expiration Action
+
+The expiration action supports two modes:
+
+- Days: Objects expire after the specified number of days since the last 
modification time. Must be a positive integer and cannot be 0.
+- Date: Specifies a UTC point in time. All objects last modified before that 
time are considered expired.
+
+Each rule can specify at most one Expiration action. `Days` and `Date` are 
mutually exclusive.
+
+Expiration validation rules:
+
+- Exactly one of `Days` or `Date` must be specified. They cannot be specified 
together, nor can both be omitted.
+- `Days` must be a positive integer greater than zero.
+- `Date` must conform to ISO 8601 format and must include both the time and 
timezone components (they cannot be omitted). Valid examples: 
`2042-04-02T00:00:00Z`, `2042-04-02T00:00:00+00:00`.
+- `Date` must resolve to midnight UTC (`00:00:00`) after timezone conversion. 
Non-zero hours, minutes, or seconds are not allowed.
+- `Date` must be a future time relative to when the lifecycle configuration is 
created. Past dates are not accepted.
+
+### Filter and Prefix
+
+Rules can specify their scope in the following ways:
+
+- Prefix (top-level): Set the Prefix field directly on the Rule. Applies to 
all objects matching the prefix.
+- Filter: Specified via the Filter element, supporting:
+  - `Prefix`: Filter by prefix.
+  - `Tag`: Filter by a single tag (Key/Value pair).
+  - `And`: Combine Prefix with multiple Tag conditions.
+
+General validation rules:
+
+- Prefix and Filter cannot be used simultaneously, nor can both be omitted.
+- Setting Prefix to an empty string `""` means the rule applies to all objects 
in the bucket.
+- Prefix length cannot exceed 1024 bytes.
+- Prefix cannot point to trash directories (paths starting with `.Trash` or 
`.Trash/`).
+- Only one of Prefix, Tag, or And can be specified inside a Filter.
+- Tag Key length must be between 1 and 128 bytes. Tag Value length must be 
between 0 and 256 bytes.
+- Tag Keys within an And operator must be unique.
+- The And operator must contain at least one Tag. Specifying only a Prefix 
without Tags is not allowed. If there is no Prefix, the number of Tags must be 
greater than 1.
+
+Additional validation rules for FSO buckets:
+
+For FILE_SYSTEM_OPTIMIZED (FSO) buckets, the Prefix must be a normalized and 
valid path. The requirements are:
+
+- Cannot start with `/`. FSO bucket prefixes are relative to the bucket root 
and do not need a leading slash.
+- Cannot contain consecutive slashes `//`.
+- Path components cannot contain `.` (current directory), `..` (parent 
directory), or `:`.
+
+The following table shows examples of valid and invalid prefixes:
+
+| Prefix | Valid for FSO Bucket | Reason |
+|--------|----------------------|--------|
+| `logs/` | Valid | Normalized directory prefix |
+| `data/2024/` | Valid | Multi-level directory prefix |
+| `archive` | Valid | Simple prefix without slash |
+| `/logs/` | Invalid | Cannot start with `/`, use `logs/` instead |
+| `data//backup/` | Invalid | Contains consecutive slashes `//`, use 
`data/backup/` instead |
+| `data/../secret/` | Invalid | Contains `..`, parent directory references are 
not allowed |
+| `data/./logs/` | Invalid | Contains `.`, current directory references are 
not allowed |
+| `.Trash/` | Invalid | Cannot point to trash directories |
+
+## S3 Gateway API
+
+Lifecycle configurations are managed through standard S3 API operations using 
the `?lifecycle` query parameter.
+
+### Set Lifecycle Configuration
+
+Example using the `Days` mode:
+
+```json
+{
+  "Rules": [
+    {
+      "ID": "expire-logs-after-30-days",
+      "Status": "Enabled",
+      "Filter": {
+        "Prefix": "logs/"
+      },
+      "Expiration": {
+        "Days": 30
+      }
+    }
+  ]
+}
+```
+
+Example using the `Date` mode:
+
+```json
+{
+  "Rules": [
+    {
+      "ID": "expire-temp-data",
+      "Status": "Enabled",
+      "Filter": {
+        "Prefix": "temp/"
+      },
+      "Expiration": {
+        "Date": "2042-04-02T00:00:00Z"
+      }
+    }
+  ]
+}
+```
+
+Example using `And` to combine Prefix and Tag filtering:
+
+```json
+{
+  "Rules": [
+    {
+      "ID": "expire-tagged-objects",
+      "Status": "Enabled",
+      "Filter": {
+        "And": {
+          "Prefix": "data/",
+          "Tags": [
+            {
+              "Key": "environment",
+              "Value": "dev"
+            }
+          ]
+        }
+      },
+      "Expiration": {
+        "Days": 7
+      }
+    }
+  ]
+}
+```
+
+Set lifecycle configuration using AWS CLI:
+
+```shell
+aws s3api put-bucket-lifecycle-configuration \
+  --bucket mybucket \
+  --endpoint-url http://localhost:9878 \
+  --lifecycle-configuration file://lifecycle.json
+```
+
+### Get Lifecycle Configuration
+
+GET `/{bucket}?lifecycle`
+
+```shell
+aws s3api get-bucket-lifecycle-configuration \
+  --bucket mybucket \
+  --endpoint-url http://localhost:9878
+```
+
+### Delete Lifecycle Configuration
+
+DELETE `/{bucket}?lifecycle`
+
+```shell
+aws s3api delete-bucket-lifecycle \
+  --bucket mybucket \
+  --endpoint-url http://localhost:9878
+```
+
+## Bucket Layout Support
+
+Ozone supports three bucket layouts: OBJECT_STORE (OBS), LEGACY, and 
FILE_SYSTEM_OPTIMIZED (FSO). The lifecycle management behavior varies across 
different layouts.
+
+### OBS and LEGACY Buckets
+
+For OBS and LEGACY buckets, the lifecycle service directly iterates through 
the Key Table and performs prefix matching on key names. Objects that match and 
meet the expiration criteria are either deleted directly (OBS buckets do not 
support trash) or moved to trash (LEGACY buckets).
+
+### FSO Buckets
+
+For FSO buckets, the lifecycle service performs recursive evaluation based on 
the directory tree:
+
+1. Parses the directory path from the prefix and locates the corresponding 
directory in the directory table.
+2. Traverses the directory tree in depth-first order, evaluating files and 
subdirectories level by level.
+3. If all files and subdirectories under a directory have expired, the 
directory itself is also marked as expired.
+
+Prefix semantic differences:
+
+| Prefix | OBS/LEGACY Behavior | FSO Behavior |
+|--------|---------------------|--------------|
+| `""` (empty) | Matches all objects | Matches all objects and directories |
+| `key` | Matches all keys starting with `key` | Matches files and directories 
starting with `key` |
+| `dir/` | Matches all keys starting with `dir/` | Matches files and 
subdirectories under `dir`, excluding `dir` itself |
+| `dir1/dir2` | Matches all keys starting with `dir1/dir2` | Matches files and 
directories under `dir1` starting with `dir2` |
+
+<div class="alert alert-warning" role="alert">
+
+For FSO buckets, a directory's ModificationTime is not updated when its child 
files or subdirectories change. Therefore, when configuring prefixes, if you 
only want to expire the contents under a directory without accidentally 
deleting the directory itself, append `/` to the end of the prefix. For 
example, use `data/` instead of `data`.
+
+</div>
+
+## Trash Integration
+
+By default, the lifecycle service moves expired objects to trash (the `.Trash` 
directory) instead of deleting them directly. This provides a layer of 
protection against accidental operations.
+
+- When `ozone.lifecycle.service.move.to.trash.enabled` is set to `true` (the 
default), expired objects are moved to the `.Trash/<owner>/Current/` path.
+- OBS buckets do not support trash; expired objects are deleted directly.
+- When set to `false`, all expired objects are deleted directly.
+
+Objects moved to trash still follow Ozone's trash cleanup policy and will be 
permanently deleted after the retention period.
+
+## Configuration
+
+The lifecycle service is disabled by default and must be explicitly enabled in 
`ozone-site.xml`.
+
+```XML
+<property>
+   <name>ozone.lifecycle.service.enabled</name>
+   <value>true</value>
+   <description>Enable the object lifecycle management service.</description>
+</property>
+```
+
+The following table lists all related configuration properties:
+
+| Property | Description |
+|----------|-------------|
+| `ozone.lifecycle.service.enabled` | Whether to enable the lifecycle 
management service. |

Review Comment:
   Let's show the default value of each property too. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to