[ 
https://issues.apache.org/jira/browse/HDDS-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-8342:
-----------------------------
    Summary: S3 Lifecycle Configurations - Object Expire  (was: AWS S3 
Lifecycle Configurations - Object Expire)

> S3 Lifecycle Configurations - Object Expire
> -------------------------------------------
>
>                 Key: HDDS-8342
>                 URL: https://issues.apache.org/jira/browse/HDDS-8342
>             Project: Apache Ozone
>          Issue Type: New Feature
>          Components: OM, S3
>            Reporter: Mohanad Elsafty
>            Assignee: Mohanad Elsafty
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: RetentionManager.png, image-2023-03-31-12-42-46-971.png
>
>
> I had the need for a retention solution in my cluster (delete keys in 
> specific paths after some time). The idea was very similar to AWS S3 
> Lifecycle configurations (Expiration part). 
> [https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-configuration-examples.html]
> I made a design and already Implemented most of it, and would like to 
> contribute back to Apache Ozone community.
> h2. Here is what included
>  # User should be able to create/remove/fetch lifecycle configurations for a 
> specific S3 bucket.
>  # The lifecycle configurations will be executed periodically.
>  # Depending on the rules of the lifecycle configuration there could be 
> different actions or even multiple actions. 
>  # At the moment only expiration is supported (keys get deleted).
>  # The lifecycle configurations supports all buckets not only S3 buckets.
>  
> h1. Design
> !image-2023-03-31-12-42-46-971.png!
>  
> h2. Components
>  # Lifecycle configurations (will be stored in DB) consists of volumeName, 
> bucketName and a list of rules
>  ** A rule contains prefix (string), Expiration and an optional Filter.
>  ** Expiration contains either days (integer) or Date (long)
>  ** Filter contains prefix (string).
>  # S3G bucket endpoint needs few updates to accept ?/lifecycle 
>  # ClientProtocol and all implementers provides (get, list, delete and 
> create) lifecycle configuration
>  # RetentionManager will be running periodically.
>  ** Fetches a lifecycle configurations list with the help of OM
>  ** Executes each lifecycle configuration on a specific bucket
>  ** Lifecycle configurations will be running on parallel (each one against 
> different bucket).
> h2. Flow
>  # Users PUT/GET/DELETE lifecycle configurations via S3Gateway.
>  # The lifecycle configurations details will be sent to some handler to be 
> processed.
>  # The lifecycle configurations will be saved to/fetched from the DB.
>  # RetentionManager will be running periodically in the Leader OM to execute 
> these lifecycle configurations.
>  # RetentionManager will be issuing deletions for eligible keys.
>  
> h2. Not a complete solution
> The solution lacks some interesting features for example:
>  * The filter doesn't support `AND` yet.
>  * Only expiration is supported.
>  * A CLI to manage lifecycle configurations for all the buckets (At the 
> moment S3G is the only supported entry).
> But these kind of features can be added in the future.
>  
>  
> *I made some decisions that must be discussed before contributing (Current 
> design)*
> Lifecycle configurations will be stored in its own column family in the DB 
> instead being a filed in the {*}OmBucketInfo{*}.
> I preferred the lifecycle configuration to have its own table for two reasons:
>  # No need to modify OmBucketInfo table.
>  # The way the Retention manager Works, this way It will query only the 
> buckets that has an attached lifecycle configuration. if the lifecycle is a 
> filed in OmBucketInfo it will have to query all the buckets and filter the 
> ones that has a LifecycleConfiguration.
> If the other way is preferred, then I will get rid of 
> LifecycleConfigurationsManager & the new codec.
>  
> To summarize this:
>  
> ||A new table for lifecycle configurations||A new field in OmBucketInfo||
> |A new table|Existing table|
> |Efficient query|Less efficient|
> |A new manager (lifecycle manager)|No need|
> |A new codec |No need|
> |No need to alter existing design|Need to update the existing design|
> |Need to update Bucket Deletion. Delete
> the linked lifecycle configuration when
> the bucket is deleted. |No need for updates|
> | |Needs updates to create, get, list
> and delete lifecycle configuration
> in the BucketManager.|
>  
>  
> h2. Plan for contribution
> The implementation is not small enough for review. I believe it needs to be 
> split into few merge requests for better review. Here is my suggested 
> breakdown.
>  # Basic building blocks (lifecycle configuration, rule, expiration, ...) And 
> the related table (if needed).
>  # ClientProtocol & OzoneManager new operations (create, get, list, delete) 
> lifecycle configurations (protobuf messages as well)
>  # S3G endpoints updates.
>  # The retention manager.
>  # All of them to be merged into a new branch (Let's call it X)
>  # Then merge branch X into master.
>  
> Please feel free to review the design and ask for more clarifications if 
> needed.
> A High level design document 
> https://docs.google.com/document/d/1LDE7jnhPJ_fc--zEob48RmqDxcqrDgOv3E3NBInoC08/edit?usp=sharing
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to