Mohanad Elsafty created HDDS-8342:
-------------------------------------
Summary: AWS S3 Lifecycle Configurations
Key: HDDS-8342
URL: https://issues.apache.org/jira/browse/HDDS-8342
Project: Apache Ozone
Issue Type: New Feature
Components: OM, S3
Reporter: Mohanad Elsafty
Assignee: Mohanad Elsafty
Attachments: image-2023-03-31-12-42-46-971.png
I had the need for a retention solution in my cluster (delete keys in specific
paths after some time). The idea was very similar to AWS S3 Lifecycle
configurations (Expiration part).
[https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-configuration-examples.html]
I made a design and already Implemented most of it, and would like to
contribute back to Apache Ozone community.
h2. Here is what included
# User should be able to create/remove/fetch lifecycle configurations for a
specific S3 bucket.
# The lifecycle configurations will be executed periodically.
# Depending on the rules of the lifecycle configuration there could be
different actions or even multiple actions.
# At the moment only expiration is supported (keys get deleted).
# The lifecycle configurations supports all buckets not only S3 buckets.
h1. Design
!image-2023-03-31-12-42-46-971.png!
h2. Components
# Lifecycle configurations (will be stored in DB) consists of volumeName,
bucketName and a list of rules
** A rule contains prefix (string), Expiration and an optional Filter.
** Expiration contains either days (integer) or Date (long)
** Filter contains prefix (string).
# S3G bucket endpoint needs few updates to accept ?/lifecycle
# ClientProtocol and all implementers provides (get, list, delete and create)
lifecycle configuration
# RetentionManager will be running periodically.
** Fetches a lifecycle configurations list with the help of OM
** Executes each lifecycle configuration on a specific bucket
** Lifecycle configurations will be running on parallel (each one against
different bucket).
h2. Flow
# Users PUT/GET/DELETE lifecycle configurations via S3Gateway.
# The lifecycle configurations details will be sent to some handler to be
processed.
# The lifecycle configurations will be saved to/fetched from the DB.
# RetentionManager will be running periodically in the Leader OM to execute
these lifecycle configurations.
# RetentionManager will be issuing deletions for eligible keys.
h2. Not a complete solution
The solution lacks some interesting features for example:
* The filter doesn't support `AND` yet.
* Only expiration is supported.
* A CLI to manage lifecycle configurations for all the buckets (At the moment
S3G is the only supported entry).
But these kind of features can be added in the future.
*I made some decisions that must be discussed before contributing (Current
design)*
Lifecycle configurations will be stored in its own column family in the DB
instead being a filed in the {*}OmBucketInfo{*}.
I preferred the lifecycle configuration to have its own table for two reasons:
# No need to modify OmBucketInfo table.
# The way the Retention manager Works, this way It will query only the buckets
that has an attached lifecycle configuration. if the lifecycle is a filed in
OmBucketInfo it will have to query all the buckets and filter the ones that has
a LifecycleConfiguration.
If the other way is preferred, then I will get rid of
LifecycleConfigurationsManager & the new codec.
To summarize this:
||A new table for lifecycle configurations||A new field in OmBucketInfo||
|A new table|Existing table|
|Efficient query|Less efficient|
|A new manager (lifecycle manager)|No need|
|A new codec |No need|
|No need to alter existing design|Need to update the existing design|
|Need to update Bucket Deletion. Delete
the linked lifecycle configuration when
the bucket is deleted. |No need for updates|
| |Needs updates to create, get, list
and delete lifecycle configuration
in the BucketManager.|
h2. Plan for contribution
The implementation is not small enough for review. I believe it needs to be
split into few merge requests for better review. Here is my suggested breakdown.
# Basic building blocks (lifecycle configuration, rule, expiration, ...) And
the related table (if needed).
# ClientProtocol & OzoneManager new operations (create, get, list, delete)
lifecycle configurations (protobuf messages as well)
# S3G endpoints updates.
# The retention manager.
# All of them to be merged into a new branch (Let's call it X)
# Then merge branch X into master.
Please feel free to review the design and ask for more clarifications if needed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]