ivandika3 commented on code in PR #6989: URL: https://github.com/apache/ozone/pull/6989#discussion_r1692365363
########## hadoop-hdds/docs/content/design/storage-policy.md: ########## @@ -0,0 +1,397 @@ +--- +title: Ozone Storage Policy Support +summary: Support Ozone storage strategy, and support to write key into the specified type of storage medium. +date: 2024-07-25 +jira: HDDS-11233 +status: draft +--- +<!-- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# Terminology + +## Terminology + +- Storage Policy: Defines where key data replicas should be stored in specific storage tiers. +- Storage Type: The types of disks/Container replicas in a Datanode, storage type could include RAM_DISK, SSD, HDD, ARCHIVE, etc. +- Storage Tier: A set of Container replicas in a cluster that satisfy the storage policy. +- Volume: In this document, unless otherwise specified, a volume refers to the volume of a Datanode.. +- prefix: The prefix in this article, unless otherwise specified, refers to the prefix of the storage policy type, not the ACL prefix. The prefix of the storage policy type is used to configure the prefix of the storage policy for the specified prefix. + +## Storage Policy vs Storage Type vs Storage Tier + + + +The relation of Storage Policy, Storage Type and Storage Tier + +- The storage policy is the property of key/bucket/ prefix (Managed by OM); +- The storage tier is the property of Pipeline and Container (Managed by SCM); +- The storage type is the property of volume and Container replicas (Managed by DN); +- Only the storage policy can be modified by the user directly via ozone command; + +Example: + +For a keyA, its storage policy is Hot, its Container 1 tier is SSD tier, and Container 1 has three replicas, all of which are of the SSD storage type. + +# User Scenarios + +- User A needs a bucket that supports high-performance IO, so create a bucket with the storage policy set to Hot. Data written by User A to bucket will automatically be distributed across the SSD disks in the cluster. +- User B needs higher IO performance for the directory/prefix /project/metadata, so set the storage policy for the prefix /project/metadata to Hot. Subsequently, data written to /project/metadata will be automatically distributed across the SSD disks in the cluster. +- User C has already written key1 to the cluster and requires better IO performance. The storage policy for key1 can be set to Hot, and then a migration can be triggered to move key1 to the SSD disks. +- Use D use command `aws s3 cp myfile.txt s3://my-bucket/myfile.txt --storage-class XXX` upload a file the Ozone SSD tier + +# Current Status + +- Ozone currently has some support for tiered storage such as storage type, and some parts of this article may already be implemented. +- Currently, in Ozone, when a key is created, the key's Block can appear on any volume of a Datanode. When a key is created, SCM first needs to allocate a Block for the key through Pipelines. The Client then writes the Block to the corresponding Datanode based on the Pipeline information. In this process, the smallest element managed by the SCM Pipeline is the Datanode, and when the Datanode creates a Container, the Container may appear on any volume with enough remaining space. Under the current architecture, Ozone does not support writing data to specific disks + +# Goal Requirements Specification + +### **Support for Storage Policy Writing and Management** + +- **Writing keys**: Allow keys to be written to specified storage tiers based on storage policies. +- **Policy Management**: Enable setting, unsetting, and inheriting storage policies for keys, prefixes, and buckets. Inherit policies based on the longest matching prefix or bucket if no specific policy is set. + +### **Support for Data Migration Across Different Storage Policies** + +- **Data Migration**: Support data migration across different storage policies via manual triggers, ensuring data is moved to the appropriate storage tiers. + +### **Adaptation of AWS S3 StorageClass** + +- **S3 StorageClass Mapping**: Map AWS S3 storage classes to Ozone storage policies, supporting related API operations (PutObject, CopyObject, Multipart Upload, GetObject, HeadObject, ListObjects). + +### **Management and Monitoring Tools** + +- **Storage Policy Commands**: Provide tools to view storage policies of containers, datanode usage, and pipeline information. +- **Metrics and Monitoring**: Enable visibility into storage policy compliance, container storage types, and space information across different storage policies. + +### **Future Enhancements** + +- **Intelligent Storage Policies**: Plan to support automatic data migration based on access frequency, similar to S3 Intelligent-Tiering. +- **Bucket StorageClass Lifecycle Rules: Support setting storage policies Lifecycle Rules at the bucket level.** +- **Recon Support**: Enhance Recon to display relevant storage tier information. + +# Detailed Requirements Specification + +## Storage Policy and Storage Types + +### Supported Storage Types + +- Specify the Storage Type for each volume through configuration. If no Storage Type is specified, the default value will be DISK. +- Support Storage Type:SSD / DISK / ARCHIVE / RAM_DISK + +### Supported Storage Policies + +Support storage policy: Hot , Warm, Cold + +### Storage Policies Map To Storage Tiers + +| Storage Policy | Storage Tier for Write | Fallback Tier for Write | +| --- | --- | --- | +| Hot | SSD | DISK | +| Warm | DISK | none | +| Cold | ARCHIVE | none | +- **Storage Tier For Write**: The priority storage tier where data is written when storage policy is specified. +- **Fallback Tier for Write**: If the specified storage policy cannot be satisfied with the priority storage tier, the SCM will attempt to use this fallback tier to meet the policy requirements. + +### Storage Tier Map To Storage Type + +| Tier | StorageType of Pipeline | One Replication +Container Replicas Storage Type | Three replication +Container Replicas Storage Type | EC +Container Replicas Storage Type | +| --- | --- | --- | --- | --- | +| SSD | SSD | SSD | 3 SSD | n SSD | +| DISK | DISK | DISK | 3 DISK | n DISK | +| ARCHIVE | ARCHIVE | ARCHIVE | 3 ARCHIVE | n ARCHIVE | Review Comment: Nit: The table does not seem to be rendered properly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
