[
https://issues.apache.org/jira/browse/HDDS-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841383#comment-17841383
]
Mohanad Elsafty commented on HDDS-8342:
---------------------------------------
Thanks [~ivanandika] for answering this question, your answer is correct. I
will be updating the PR with more details regarding the *RetentionManager*
shortly.
> AWS S3 Lifecycle Configurations
> -------------------------------
>
> Key: HDDS-8342
> URL: https://issues.apache.org/jira/browse/HDDS-8342
> Project: Apache Ozone
> Issue Type: New Feature
> Components: OM, S3
> Reporter: Mohanad Elsafty
> Assignee: Mohanad Elsafty
> Priority: Major
> Labels: pull-request-available
> Attachments: image-2023-03-31-12-42-46-971.png
>
>
> I had the need for a retention solution in my cluster (delete keys in
> specific paths after some time). The idea was very similar to AWS S3
> Lifecycle configurations (Expiration part).
> [https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-configuration-examples.html]
> I made a design and already Implemented most of it, and would like to
> contribute back to Apache Ozone community.
> h2. Here is what included
> # User should be able to create/remove/fetch lifecycle configurations for a
> specific S3 bucket.
> # The lifecycle configurations will be executed periodically.
> # Depending on the rules of the lifecycle configuration there could be
> different actions or even multiple actions.
> # At the moment only expiration is supported (keys get deleted).
> # The lifecycle configurations supports all buckets not only S3 buckets.
>
> h1. Design
> !image-2023-03-31-12-42-46-971.png!
>
> h2. Components
> # Lifecycle configurations (will be stored in DB) consists of volumeName,
> bucketName and a list of rules
> ** A rule contains prefix (string), Expiration and an optional Filter.
> ** Expiration contains either days (integer) or Date (long)
> ** Filter contains prefix (string).
> # S3G bucket endpoint needs few updates to accept ?/lifecycle
> # ClientProtocol and all implementers provides (get, list, delete and
> create) lifecycle configuration
> # RetentionManager will be running periodically.
> ** Fetches a lifecycle configurations list with the help of OM
> ** Executes each lifecycle configuration on a specific bucket
> ** Lifecycle configurations will be running on parallel (each one against
> different bucket).
> h2. Flow
> # Users PUT/GET/DELETE lifecycle configurations via S3Gateway.
> # The lifecycle configurations details will be sent to some handler to be
> processed.
> # The lifecycle configurations will be saved to/fetched from the DB.
> # RetentionManager will be running periodically in the Leader OM to execute
> these lifecycle configurations.
> # RetentionManager will be issuing deletions for eligible keys.
>
> h2. Not a complete solution
> The solution lacks some interesting features for example:
> * The filter doesn't support `AND` yet.
> * Only expiration is supported.
> * A CLI to manage lifecycle configurations for all the buckets (At the
> moment S3G is the only supported entry).
> But these kind of features can be added in the future.
>
>
> *I made some decisions that must be discussed before contributing (Current
> design)*
> Lifecycle configurations will be stored in its own column family in the DB
> instead being a filed in the {*}OmBucketInfo{*}.
> I preferred the lifecycle configuration to have its own table for two reasons:
> # No need to modify OmBucketInfo table.
> # The way the Retention manager Works, this way It will query only the
> buckets that has an attached lifecycle configuration. if the lifecycle is a
> filed in OmBucketInfo it will have to query all the buckets and filter the
> ones that has a LifecycleConfiguration.
> If the other way is preferred, then I will get rid of
> LifecycleConfigurationsManager & the new codec.
>
> To summarize this:
>
> ||A new table for lifecycle configurations||A new field in OmBucketInfo||
> |A new table|Existing table|
> |Efficient query|Less efficient|
> |A new manager (lifecycle manager)|No need|
> |A new codec |No need|
> |No need to alter existing design|Need to update the existing design|
> |Need to update Bucket Deletion. Delete
> the linked lifecycle configuration when
> the bucket is deleted. |No need for updates|
> | |Needs updates to create, get, list
> and delete lifecycle configuration
> in the BucketManager.|
>
>
> h2. Plan for contribution
> The implementation is not small enough for review. I believe it needs to be
> split into few merge requests for better review. Here is my suggested
> breakdown.
> # Basic building blocks (lifecycle configuration, rule, expiration, ...) And
> the related table (if needed).
> # ClientProtocol & OzoneManager new operations (create, get, list, delete)
> lifecycle configurations (protobuf messages as well)
> # S3G endpoints updates.
> # The retention manager.
> # All of them to be merged into a new branch (Let's call it X)
> # Then merge branch X into master.
>
> Please feel free to review the design and ask for more clarifications if
> needed.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]