[
https://issues.apache.org/jira/browse/HDDS-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17749111#comment-17749111
]
Ivan Andika edited comment on HDDS-9095 at 8/1/23 1:17 AM:
-----------------------------------------------------------
> Can we just rely on this ticket to fix the problem afterwards?
Do you mean for now we add the check to exclude open keys whose
isMultipartKey=false in the OpenKeyCleanupService? And then for the MPU-related
open keys that has isMultipartKey=false and are already/will deleted by the
OpenKeyCleanupService, we let them be deleted and handle it in the MPU Cleanup
request/response (i.e. skip the non-exist error)?
For the regex solution, I was trying to add it as an additional guard check
when isMultipartKey=false. Since in our cluster, we haven't backported and
deployed the OpenKeyCleanupService in our cluster, we can avoid the orphan keys
issue altogether. However, I understand that it's a "hacky" way to deal with
the issues. If the MPU Cleanup process can handle the non-exist MPU-related
open keys without sacrificing correctness, I think it should be fine to not
include the regex solution.
Personally, I'm not a fan of the manual DB repair, since the repair of OM DB
requires us to stop the Ozone Manager, there will be reduced availability, and
the repair might take a long time which increase the risk of slow follower
issues / huge OM installSnapshot (described in HDDS-8131). Also, it needs to be
carried carefully by the admins. If possible, I would prefer to handle it in
the OM background service directly.
> is it OK to only keep the "without failing when open MPU keys do not exist"
> logic?
I tried implementing using S3MultipartUploadAbortRequest, but it doesn't work
due to the existing ACL checks, and it can be slow since it's not batched and
not grouped based on bucket (might reacquire the same bucket lock a lot of
time). So currently, I will implement it similar to how it's implemented in
OpenKeyCleanupService (no ACL checks) and also uses OpenKeyBucket (which groups
by (volume,bucket) for locking efficiency)). This will not throw error when
open MPU keys do not exist. I'm not sure regarding the correctness if the
errors are not thrown, but we can also add the updateId check on the
multipartInfoTable.
Please let me know what you think. I will try to come up with related PRs in
the coming days.
was (Author: JIRAUSER298977):
> Can we just rely on this ticket to fix the problem afterwards?
Do you mean for now we add the check to exclude open keys whose
isMultipartKey=false in the OpenKeyCleanupService? And then for the MPU-related
open keys that has isMultipartKey=false and are already/will deleted by the
OpenKeyCleanupService, we let them be deleted and handle it in the MPU Cleanup
request/response (i.e. skip the non-exist error)?
For the regex solution, I was trying to add it as an additional guard check
when isMultipartKey=false. Since in our cluster, we haven't backported and
deployed the OpenKeyCleanupService in our cluster, we can avoid the orphan keys
issue altogether. However, I understand that it's a "hacky" way to deal with
the issues. If the MPU Cleanup process can handle the non-exist MPU-related
open keys without sacrificing correctness, I think it should be fine to not
include the regex solution.
Personally, I'm not a fan of the manual DB repair, since the repair of OM DB
requires us to stop the Ozone Manager, there will be reduced availability, and
the repair might take a long time which increase the risk of slow follower
issues / huge OM installSnapshot (described in HDDS-8131). Also, it needs to be
carried carefully by the admins. If possible, I would prefer to handle it in
the OM background service directly.
> is it OK to only keep the "without failing when open MPU keys do not exist"
> logic?
I tried implementing using S3MultipartUploadAbortRequest, but it doesn't work
due to the existing ACL checks, and it can be slow since it's not batched and
not grouped based on bucket (might reacquire the same bucket lock a lot of
time). So currently, I will implement it similar on how it's implemented in
similar things like OpenKeyCleanupService (no ACL checks) and also uses
OpenKeyBucket (which groups by (volume,bucket) for locking efficiency)). This
will not throw error when open MPU keys do not exist. I'm not sure regarding
the correctness if the errors are not thrown, but we can also add the updateId
check on the multipartInfoTable.
Please let me know what you think. I will try to come up with related PRs in
the coming days.
> Implement cleanup service for MultipartInfoTable
> ------------------------------------------------
>
> Key: HDDS-9095
> URL: https://issues.apache.org/jira/browse/HDDS-9095
> Project: Apache Ozone
> Issue Type: Improvement
> Components: OM
> Reporter: Ivan Andika
> Assignee: Ivan Andika
> Priority: Major
> Fix For: 1.4.0
>
>
> In our cluster, there are around few thousands MPU keys in
> multipartInfoTable, with some of them being few months old.
> The reason is that these MPU keys are already initiated, and possibly
> committed few parts, but was not completed / aborted by the user. These
> spaces can be freed.
> Similar to the cleanup service OM open key table (HDDS-4120), we can
> implement clean up on MultipartInfoTable (and related open keys in
> OpenKeyTable) using a background job. -However, instead of using a new OM
> request / response, we can reuse the OM MPU abort request / response instead,
> which already handles the cleanup (i.e. expired inflight MPU can be aborted
> after a defined expiry threshold).-
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]