greenwich commented on code in PR #6989: URL: https://github.com/apache/ozone/pull/6989#discussion_r3007348520
########## hadoop-hdds/docs/content/design/storage-policy.md: ########## @@ -0,0 +1,607 @@ +--- +title: Ozone Storage Policy Support +summary: Support storage policy in Ozone to write key data into specified types of storage media. +date: 2026-03-23 +jira: HDDS-11233 +status: draft +--- + +<!-- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# Terminology + +## Definitions + +- Storage Policy: Defines where key data replicas should be stored in specific storage tiers. +- Storage Type: The type of each Datanode volume or container replica. Each Datanode volume can be configured with a + storage type, including SSD, DISK, and ARCHIVE. +- Storage Tier: A specific storage tier is composed of all replicas of a container based on their storage type. For + example, a 3-replica SSD tier consists of 3 replicas of SSD type. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode. +- Key: In this document, a key refers to an object in Ozone, including entries in both the KeyTable and FileTable. + +## Storage Policy vs Storage Type vs Storage Tier + + + +The relationship between Storage Policy, Storage Type, and Storage Tier: + +- The storage policy is the property of key/bucket (managed by OM). +- The storage tier is the property of Pipeline and Container (managed by SCM). +- The storage type is the property of volume and container replica (managed by DN). +- Only the storage policy can be modified by the user directly via the ozone command. + +Example: + +For a keyA, its storage policy is Hot, Its Container tier is SSD tier, the Container has three replicas, all of which +are of the SSD storage type. + +# User Scenarios + +- User A needs a bucket that supports high-performance IO, so they create a bucket with the storage policy set to Hot. + Data written by User A to the bucket will automatically be distributed across SSD disks in the cluster. +- User B needs higher IO performance for a specific key. They write a key with the storage policy set to Hot. The + key's data will be distributed across SSD disks in the cluster. +- User C uses the command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class STANDARD` to upload a file + to the Ozone SSD tier. The key's data will be distributed across SSD disks in the cluster. + +# Goals + +- Storage Policy: Introduce storage policy and related concepts. Define multiple storage policies and support S3 + storage class. +- Storage Policy Writing: Allow writing keys/files to specified storage tiers based on storage policy. Support S3, + API, and shell command interfaces. +- Storage Policy Update: Enable setting and unsetting storage policies for buckets, and setting storage tiers for + containers. +- Storage Policy Display: Support displaying the storage policy attribute of buckets and keys. Support displaying the + storage tier of SCM containers and pipelines. Support displaying Datanode storage type usage information. Support + checking whether the key storage policy is satisfied. +- Container Balancer: Support migrating container replicas between Datanodes to volumes of the matching storage type. + For example, SSD type container replicas will be migrated to SSD type volumes, and will not be migrated to DISK + type volumes. +- ReplicationManager: Support managing the storage type of container replicas to ensure that container replicas on + Datanodes reside on the correct volumes. Ensure that the storage types of container replicas forming a storage + tier are correct. For example, a 3-replica SSD storage tier container in SCM should consist of 3 SSD type container + replicas, and each container replica should reside on an SSD type volume. +- DiskBalancerService: Support migrating container replicas within a Datanode to volumes of the matching storage type. + For example, SSD type container replicas will be migrated to SSD type volumes, and will not be migrated to DISK + type volumes. + +# Design + +## Supported Storage Policies + +- Supported storage policies: Hot / Warm / Cold +- Supported storage tiers: SSD / DISK / ARCHIVE / EMPTY +- Supported storage types: SSD / DISK / ARCHIVE +- Supported bucket layouts: FILE_SYSTEM_OPTIMIZED, OBJECT_STORE, LEGACY +- S3 storage classes: STANDARD / STANDARD_IA / GLACIER + +### Storage Policy Map to Storage Tier + +| Storage Policy | Storage Tier for Write | Fallback Tier for Write | +|----------------|------------------------|-------------------------| +| Hot | SSD | DISK | +| Warm | DISK | EMPTY | +| Cold | ARCHIVE | EMPTY | + +- Storage Tier for Write: The primary storage tier where data is written when a storage policy is specified. +- Fallback Tier for Write: If the specified storage policy cannot be satisfied with the primary storage tier, SCM + will attempt to use this fallback tier to meet the policy requirements. EMPTY means no fallback is available. + +### Storage Tier Map to Storage Type + +| Tier | Storage Type of Pipeline | One Replica Container Storage Type | Three Replica Container Storage Type | EC Container Replicas Storage Type | +|---------|--------------------------|-------------------------------------|--------------------------------------|-------------------------------------| +| SSD | SSD | SSD | 3 SSD | n SSD | +| DISK | DISK | DISK | 3 DISK | n DISK | +| ARCHIVE | ARCHIVE | ARCHIVE | 3 ARCHIVE | n ARCHIVE | +| EMPTY | - | - | - | - | + +### Fallback Storage Type for Container Replica Replication/Migration + +| Container Replica Storage Type | Fallback Storage Types (ordered) [1] | +|--------------------------------|--------------------------------------| +| SSD | DISK, ARCHIVE | +| DISK | ARCHIVE | +| ARCHIVE | none | + +- Fallback Storage Type: During the container replica replication or migration process, if SCM cannot find a suitable + volume type that matches the original container replica's storage type, it will attempt to use the fallback storage + types in order. + +[1] A container replica does not know the storage policy of the key or the storage tier of the SCM container it belongs +to. The container replica only knows its own expected storage type, which is why the column name is "Fallback Storage +Types" rather than "Fallback Storage Tier". + +### AWS S3 StorageClass + +| AWS S3 StorageClass | Ozone Storage Policy | +|---------------------|----------------------| +| STANDARD [1] | Hot | +| STANDARD_IA | Warm | +| GLACIER | Cold | +| DEEP_ARCHIVE | Warm | + +> AWS StorageClass Valid Values: STANDARD | REDUCED_REDUNDANCY | STANDARD_IA | ONEZONE_IA | INTELLIGENT_TIERING | +> GLACIER | DEEP_ARCHIVE | OUTPOSTS | GLACIER_IR | SNOW | EXPRESS_ONEZONE +> According to AWS S3 documentation, STANDARD is the highest performance S3 StorageClass, but its name is STANDARD, +> which is not straightforward to map to the Ozone SSD tier. + +[1] The field names here reuse the AWS S3 field names, but the actual semantics differ from AWS S3. For example, in +Ozone, STANDARD represents the Hot storage policy, while in AWS S3, STANDARD has different semantics. + +## Component Changes + +### Datanode Container Replica + +A storage type field is added to container replicas on Datanodes, which is persisted in the container's metadata YAML +file. + +### Bucket, Key + +A storage policy attribute is added to buckets and keys on OM. Review Comment: Yes, I think it's highly needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
