Mohanad Elsafty created HDDS-8342:
-------------------------------------

             Summary: AWS S3 Lifecycle Configurations
                 Key: HDDS-8342
                 URL: https://issues.apache.org/jira/browse/HDDS-8342
             Project: Apache Ozone
          Issue Type: New Feature
          Components: OM, S3
            Reporter: Mohanad Elsafty
            Assignee: Mohanad Elsafty
         Attachments: image-2023-03-31-12-42-46-971.png

I had the need for a retention solution in my cluster (delete keys in specific 
paths after some time). The idea was very similar to AWS S3 Lifecycle 
configurations (Expiration part). 
[https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-configuration-examples.html]


I made a design and already Implemented most of it, and would like to 
contribute back to Apache Ozone community.
h2. Here is what included
 # User should be able to create/remove/fetch lifecycle configurations for a 
specific S3 bucket.
 # The lifecycle configurations will be executed periodically.
 # Depending on the rules of the lifecycle configuration there could be 
different actions or even multiple actions. 
 # At the moment only expiration is supported (keys get deleted).
 # The lifecycle configurations supports all buckets not only S3 buckets.

 
h1. Design

!image-2023-03-31-12-42-46-971.png!

 
h2. Components
 # Lifecycle configurations (will be stored in DB) consists of volumeName, 
bucketName and a list of rules
 ** A rule contains prefix (string), Expiration and an optional Filter.
 ** Expiration contains either days (integer) or Date (long)
 ** Filter contains prefix (string).
 # S3G bucket endpoint needs few updates to accept ?/lifecycle 
 # ClientProtocol and all implementers provides (get, list, delete and create) 
lifecycle configuration
 # RetentionManager will be running periodically.
 ** Fetches a lifecycle configurations list with the help of OM
 ** Executes each lifecycle configuration on a specific bucket
 ** Lifecycle configurations will be running on parallel (each one against 
different bucket).

h2. Flow
 # Users PUT/GET/DELETE lifecycle configurations via S3Gateway.
 # The lifecycle configurations details will be sent to some handler to be 
processed.
 # The lifecycle configurations will be saved to/fetched from the DB.
 # RetentionManager will be running periodically in the Leader OM to execute 
these lifecycle configurations.
 # RetentionManager will be issuing deletions for eligible keys.

 
h2. Not a complete solution

The solution lacks some interesting features for example:
 * The filter doesn't support `AND` yet.
 * Only expiration is supported.
 * A CLI to manage lifecycle configurations for all the buckets (At the moment 
S3G is the only supported entry).

But these kind of features can be added in the future.

 

 

*I made some decisions that must be discussed before contributing (Current 
design)*

Lifecycle configurations will be stored in its own column family in the DB 
instead being a filed in the {*}OmBucketInfo{*}.

I preferred the lifecycle configuration to have its own table for two reasons:
 # No need to modify OmBucketInfo table.
 # The way the Retention manager Works, this way It will query only the buckets 
that has an attached lifecycle configuration. if the lifecycle is a filed in 
OmBucketInfo it will have to query all the buckets and filter the ones that has 
a LifecycleConfiguration.

If the other way is preferred, then I will get rid of 
LifecycleConfigurationsManager & the new codec.

 

To summarize this:

 
||A new table for lifecycle configurations||A new field in OmBucketInfo||
|A new table|Existing table|
|Efficient query|Less efficient|
|A new manager (lifecycle manager)|No need|
|A new codec |No need|
|No need to alter existing design|Need to update the existing design|
|Need to update Bucket Deletion. Delete
the linked lifecycle configuration when
the bucket is deleted. |No need for updates|
| |Needs updates to create, get, list
and delete lifecycle configuration
in the BucketManager.|

 

 
h2. Plan for contribution

The implementation is not small enough for review. I believe it needs to be 
split into few merge requests for better review. Here is my suggested breakdown.
 # Basic building blocks (lifecycle configuration, rule, expiration, ...) And 
the related table (if needed).
 # ClientProtocol & OzoneManager new operations (create, get, list, delete) 
lifecycle configurations (protobuf messages as well)
 # S3G endpoints updates.
 # The retention manager.
 # All of them to be merged into a new branch (Let's call it X)
 # Then merge branch X into master.

 

Please feel free to review the design and ask for more clarifications if needed.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to