[
https://issues.apache.org/jira/browse/HDDS-11784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shawn updated HDDS-11784:
-------------------------
Description:
We observed lots of open key (files) in our FSO enabled ozone cluster. And
these are all incomplete MPU keys.
When I tried to abort MPU by using s3 cli as below, I got the exception
complaining about the parent directory is not found.
{code:bash}
aws s3api abort-multipart-upload --endpoint 'xxxx' --bucket
'2e76bd0f-9682-42c6-a5ce-3e32c5aa37b2' --key
'CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/abc.blob' --upload-id
'4103c881-24fa-4992-b7b2-5474f8a7fbaf-113204926929050074'
An error occurred (NoSuchUpload) when calling the AbortMultipartUpload
operation: The specified multipart upload does not exist. The upload ID might
be invalid, or the multipart upload might have been aborted or completed.
{code}
Exceptions in the log
{code:java}
NO_SUCH_MULTIPART_UPLOAD_ERROR
org.apache.hadoop.ozone.om.exceptions.OMException: Abort Multipart Upload
Failed: volume: s3v, bucket: 2e76bd0f-9682-42c6-a5ce-3e32c5aa37b2, key:
CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/abc.blob
at
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.validateAndUpdateCache(S3MultipartUploadAbortRequest.java:148)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.lambda$0(OzoneManagerRequestHandler.java:402)
at org.apache.hadoop.util.MetricUtil.captureLatencyNs(MetricUtil.java:39)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:398)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:587)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:375)
at
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: DIRECTORY_NOT_FOUND
org.apache.hadoop.ozone.om.exceptions.OMException: Failed to find parent
directory of CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/abc.blob
at
org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:1038)
at
org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:988)
at
org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils.getMultipartOpenKeyFSO(OMMultipartUploadUtils.java:122)
at
org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils.getMultipartOpenKey(OMMultipartUploadUtils.java:99)
at
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.getMultipartOpenKey(S3MultipartUploadAbortRequest.java:256)
at
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.validateAndUpdateCache(S3MultipartUploadAbortRequest.java:145)
... 9 more
{code}
This issue is similar as the issue HDDS-10630. We should bring the same similar
fix here. Without this, all these dangling MPU cannot be cleaned up either
manually or the background cleanup service.
Also we are not sure what the root cause for these missing parent directories.
Need some investigation.
was:
We observed lots of open key (files) in our FSO enabled ozone cluster. And
these are all incomplete MPU keys.
When I tried to abort MPU by using s3 cli as below, I got the exception
complaining about the parent directory is not found.
{code:bash}
aws s3api abort-multipart-upload --endpoint 'xxxx' --bucket
'2e76bd0f-9682-42c6-a5ce-3e32c5aa37b2' --key
'CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/ALULA_DICKINSON_ORIGINAL_101_EPISODE_DVSAUDIO_EN8CH_DOWNLOAD_FINAL_VDKSN0560101.mov'
--upload-id '4103c881-24fa-4992-b7b2-5474f8a7fbaf-113204926929050074'
An error occurred (NoSuchUpload) when calling the AbortMultipartUpload
operation: The specified multipart upload does not exist. The upload ID might
be invalid, or the multipart upload might have been aborted or completed.
{code}
Exceptions in the log
{code}
NO_SUCH_MULTIPART_UPLOAD_ERROR
org.apache.hadoop.ozone.om.exceptions.OMException: Abort Multipart Upload
Failed: volume: s3v, bucket: 2e76bd0f-9682-42c6-a5ce-3e32c5aa37b2, key:
CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/ALULA_DICKINSON_ORIGINAL_101_EPISODE_DVSAUDIO_EN8CH_DOWNLOAD_FINAL_VDKSN0560101.mov
at
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.validateAndUpdateCache(S3MultipartUploadAbortRequest.java:148)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.lambda$0(OzoneManagerRequestHandler.java:402)
at org.apache.hadoop.util.MetricUtil.captureLatencyNs(MetricUtil.java:39)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:398)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:587)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:375)
at
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: DIRECTORY_NOT_FOUND
org.apache.hadoop.ozone.om.exceptions.OMException: Failed to find parent
directory of
CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/ALULA_DICKINSON_ORIGINAL_101_EPISODE_DVSAUDIO_EN8CH_DOWNLOAD_FINAL_VDKSN0560101.mov
at
org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:1038)
at
org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:988)
at
org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils.getMultipartOpenKeyFSO(OMMultipartUploadUtils.java:122)
at
org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils.getMultipartOpenKey(OMMultipartUploadUtils.java:99)
at
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.getMultipartOpenKey(S3MultipartUploadAbortRequest.java:256)
at
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.validateAndUpdateCache(S3MultipartUploadAbortRequest.java:145)
... 9 more
{code}
This issue is similar as the issue
[HDDS-10630|https://issues.apache.org/jira/browse/HDDS-10630]. We should bring
the same similar fix here. Without this, all these dangling MPU cannot be
cleaned up either manually or the background cleanup service.
Also we are not sure what the root cause for these missing parent directories.
Need some investigation.
> parent directory not found when abort multi-part upload
> -------------------------------------------------------
>
> Key: HDDS-11784
> URL: https://issues.apache.org/jira/browse/HDDS-11784
> Project: Apache Ozone
> Issue Type: Improvement
> Components: S3
> Affects Versions: 1.4.0
> Reporter: Shawn
> Priority: Major
>
> We observed lots of open key (files) in our FSO enabled ozone cluster. And
> these are all incomplete MPU keys.
> When I tried to abort MPU by using s3 cli as below, I got the exception
> complaining about the parent directory is not found.
> {code:bash}
> aws s3api abort-multipart-upload --endpoint 'xxxx' --bucket
> '2e76bd0f-9682-42c6-a5ce-3e32c5aa37b2' --key
> 'CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/abc.blob' --upload-id
> '4103c881-24fa-4992-b7b2-5474f8a7fbaf-113204926929050074'
> An error occurred (NoSuchUpload) when calling the AbortMultipartUpload
> operation: The specified multipart upload does not exist. The upload ID might
> be invalid, or the multipart upload might have been aborted or completed.
> {code}
> Exceptions in the log
> {code:java}
> NO_SUCH_MULTIPART_UPLOAD_ERROR
> org.apache.hadoop.ozone.om.exceptions.OMException: Abort Multipart Upload
> Failed: volume: s3v, bucket: 2e76bd0f-9682-42c6-a5ce-3e32c5aa37b2, key:
> CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/abc.blob
> at
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.validateAndUpdateCache(S3MultipartUploadAbortRequest.java:148)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.lambda$0(OzoneManagerRequestHandler.java:402)
> at org.apache.hadoop.util.MetricUtil.captureLatencyNs(MetricUtil.java:39)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:398)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:587)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:375)
> at
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at java.base/java.lang.Thread.run(Thread.java:833)
> Caused by: DIRECTORY_NOT_FOUND
> org.apache.hadoop.ozone.om.exceptions.OMException: Failed to find parent
> directory of CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/abc.blob
> at
> org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:1038)
> at
> org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:988)
> at
> org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils.getMultipartOpenKeyFSO(OMMultipartUploadUtils.java:122)
> at
> org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils.getMultipartOpenKey(OMMultipartUploadUtils.java:99)
> at
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.getMultipartOpenKey(S3MultipartUploadAbortRequest.java:256)
> at
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.validateAndUpdateCache(S3MultipartUploadAbortRequest.java:145)
> ... 9 more
> {code}
> This issue is similar as the issue HDDS-10630. We should bring the same
> similar fix here. Without this, all these dangling MPU cannot be cleaned up
> either manually or the background cleanup service.
> Also we are not sure what the root cause for these missing parent
> directories. Need some investigation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]