[ https://issues.apache.org/jira/browse/HDDS-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sammi Chen updated HDDS-8342: ----------------------------- Summary: S3 Lifecycle Configurations - Object Expire (was: AWS S3 Lifecycle Configurations - Object Expire) > S3 Lifecycle Configurations - Object Expire > ------------------------------------------- > > Key: HDDS-8342 > URL: https://issues.apache.org/jira/browse/HDDS-8342 > Project: Apache Ozone > Issue Type: New Feature > Components: OM, S3 > Reporter: Mohanad Elsafty > Assignee: Mohanad Elsafty > Priority: Major > Labels: pull-request-available > Attachments: RetentionManager.png, image-2023-03-31-12-42-46-971.png > > > I had the need for a retention solution in my cluster (delete keys in > specific paths after some time). The idea was very similar to AWS S3 > Lifecycle configurations (Expiration part). > [https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-configuration-examples.html] > I made a design and already Implemented most of it, and would like to > contribute back to Apache Ozone community. > h2. Here is what included > # User should be able to create/remove/fetch lifecycle configurations for a > specific S3 bucket. > # The lifecycle configurations will be executed periodically. > # Depending on the rules of the lifecycle configuration there could be > different actions or even multiple actions. > # At the moment only expiration is supported (keys get deleted). > # The lifecycle configurations supports all buckets not only S3 buckets. > > h1. Design > !image-2023-03-31-12-42-46-971.png! > > h2. Components > # Lifecycle configurations (will be stored in DB) consists of volumeName, > bucketName and a list of rules > ** A rule contains prefix (string), Expiration and an optional Filter. > ** Expiration contains either days (integer) or Date (long) > ** Filter contains prefix (string). > # S3G bucket endpoint needs few updates to accept ?/lifecycle > # ClientProtocol and all implementers provides (get, list, delete and > create) lifecycle configuration > # RetentionManager will be running periodically. > ** Fetches a lifecycle configurations list with the help of OM > ** Executes each lifecycle configuration on a specific bucket > ** Lifecycle configurations will be running on parallel (each one against > different bucket). > h2. Flow > # Users PUT/GET/DELETE lifecycle configurations via S3Gateway. > # The lifecycle configurations details will be sent to some handler to be > processed. > # The lifecycle configurations will be saved to/fetched from the DB. > # RetentionManager will be running periodically in the Leader OM to execute > these lifecycle configurations. > # RetentionManager will be issuing deletions for eligible keys. > > h2. Not a complete solution > The solution lacks some interesting features for example: > * The filter doesn't support `AND` yet. > * Only expiration is supported. > * A CLI to manage lifecycle configurations for all the buckets (At the > moment S3G is the only supported entry). > But these kind of features can be added in the future. > > > *I made some decisions that must be discussed before contributing (Current > design)* > Lifecycle configurations will be stored in its own column family in the DB > instead being a filed in the {*}OmBucketInfo{*}. > I preferred the lifecycle configuration to have its own table for two reasons: > # No need to modify OmBucketInfo table. > # The way the Retention manager Works, this way It will query only the > buckets that has an attached lifecycle configuration. if the lifecycle is a > filed in OmBucketInfo it will have to query all the buckets and filter the > ones that has a LifecycleConfiguration. > If the other way is preferred, then I will get rid of > LifecycleConfigurationsManager & the new codec. > > To summarize this: > > ||A new table for lifecycle configurations||A new field in OmBucketInfo|| > |A new table|Existing table| > |Efficient query|Less efficient| > |A new manager (lifecycle manager)|No need| > |A new codec |No need| > |No need to alter existing design|Need to update the existing design| > |Need to update Bucket Deletion. Delete > the linked lifecycle configuration when > the bucket is deleted. |No need for updates| > | |Needs updates to create, get, list > and delete lifecycle configuration > in the BucketManager.| > > > h2. Plan for contribution > The implementation is not small enough for review. I believe it needs to be > split into few merge requests for better review. Here is my suggested > breakdown. > # Basic building blocks (lifecycle configuration, rule, expiration, ...) And > the related table (if needed). > # ClientProtocol & OzoneManager new operations (create, get, list, delete) > lifecycle configurations (protobuf messages as well) > # S3G endpoints updates. > # The retention manager. > # All of them to be merged into a new branch (Let's call it X) > # Then merge branch X into master. > > Please feel free to review the design and ask for more clarifications if > needed. > A High level design document > https://docs.google.com/document/d/1LDE7jnhPJ_fc--zEob48RmqDxcqrDgOv3E3NBInoC08/edit?usp=sharing > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org