[ 
https://issues.apache.org/jira/browse/HDDS-11784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn updated HDDS-11784:
-------------------------
    Description: 
We observed lots of open key (files) in our FSO enabled ozone cluster. And 
these are all incomplete MPU keys.

When I tried to abort MPU by using s3 cli as below, I got the exception 
complaining about the parent directory is not found.

{code:bash}
aws s3api abort-multipart-upload --endpoint 'xxxx' --bucket 
'2e76bd0f-9682-42c6-a5ce-3e32c5aa37b2' --key 
'CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/ALULA_DICKINSON_ORIGINAL_101_EPISODE_DVSAUDIO_EN8CH_DOWNLOAD_FINAL_VDKSN0560101.mov'
 --upload-id '4103c881-24fa-4992-b7b2-5474f8a7fbaf-113204926929050074'

An error occurred (NoSuchUpload) when calling the AbortMultipartUpload 
operation: The specified multipart upload does not exist. The upload ID might 
be invalid, or the multipart upload might have been aborted or completed.
{code}

Exceptions in the log

{code}
NO_SUCH_MULTIPART_UPLOAD_ERROR 
org.apache.hadoop.ozone.om.exceptions.OMException: Abort Multipart Upload 
Failed: volume: s3v, bucket: 2e76bd0f-9682-42c6-a5ce-3e32c5aa37b2, key: 
CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/ALULA_DICKINSON_ORIGINAL_101_EPISODE_DVSAUDIO_EN8CH_DOWNLOAD_FINAL_VDKSN0560101.mov
at 
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.validateAndUpdateCache(S3MultipartUploadAbortRequest.java:148)
at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.lambda$0(OzoneManagerRequestHandler.java:402)
at org.apache.hadoop.util.MetricUtil.captureLatencyNs(MetricUtil.java:39)
at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:398)
at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:587)
at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:375)
at 
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: DIRECTORY_NOT_FOUND 
org.apache.hadoop.ozone.om.exceptions.OMException: Failed to find parent 
directory of 
CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/ALULA_DICKINSON_ORIGINAL_101_EPISODE_DVSAUDIO_EN8CH_DOWNLOAD_FINAL_VDKSN0560101.mov
at 
org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:1038)
at 
org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:988)
at 
org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils.getMultipartOpenKeyFSO(OMMultipartUploadUtils.java:122)
at 
org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils.getMultipartOpenKey(OMMultipartUploadUtils.java:99)
at 
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.getMultipartOpenKey(S3MultipartUploadAbortRequest.java:256)
at 
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.validateAndUpdateCache(S3MultipartUploadAbortRequest.java:145)
... 9 more
{code}

This issue is similar as the issue 
[HDDS-10630](https://issues.apache.org/jira/browse/HDDS-10630). We should bring 
the same similar fix here. Without this, all these dangling MPU cannot be 
cleaned up either manually or the background cleanup service.

Also we are not sure what the root cause for these missing parent directories. 
Need some investigation. 

  was:
We observed lots of open key (files) in our FSO enabled ozone cluster. And 
these are all incomplete MPU keys.

When I tried to abort MPU by using s3 cli as below, I got the exception 
complaining about the parent directory is not found.

```
aws s3api abort-multipart-upload --endpoint 'xxxx' --bucket 
'2e76bd0f-9682-42c6-a5ce-3e32c5aa37b2' --key 
'CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/ALULA_DICKINSON_ORIGINAL_101_EPISODE_DVSAUDIO_EN8CH_DOWNLOAD_FINAL_VDKSN0560101.mov'
 --upload-id '4103c881-24fa-4992-b7b2-5474f8a7fbaf-113204926929050074'

An error occurred (NoSuchUpload) when calling the AbortMultipartUpload 
operation: The specified multipart upload does not exist. The upload ID might 
be invalid, or the multipart upload might have been aborted or completed.
```

Exceptions in the log
```
NO_SUCH_MULTIPART_UPLOAD_ERROR 
org.apache.hadoop.ozone.om.exceptions.OMException: Abort Multipart Upload 
Failed: volume: s3v, bucket: 2e76bd0f-9682-42c6-a5ce-3e32c5aa37b2, key: 
CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/ALULA_DICKINSON_ORIGINAL_101_EPISODE_DVSAUDIO_EN8CH_DOWNLOAD_FINAL_VDKSN0560101.mov
at 
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.validateAndUpdateCache(S3MultipartUploadAbortRequest.java:148)
at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.lambda$0(OzoneManagerRequestHandler.java:402)
at org.apache.hadoop.util.MetricUtil.captureLatencyNs(MetricUtil.java:39)
at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:398)
at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:587)
at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:375)
at 
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: DIRECTORY_NOT_FOUND 
org.apache.hadoop.ozone.om.exceptions.OMException: Failed to find parent 
directory of 
CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/ALULA_DICKINSON_ORIGINAL_101_EPISODE_DVSAUDIO_EN8CH_DOWNLOAD_FINAL_VDKSN0560101.mov
at 
org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:1038)
at 
org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:988)
at 
org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils.getMultipartOpenKeyFSO(OMMultipartUploadUtils.java:122)
at 
org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils.getMultipartOpenKey(OMMultipartUploadUtils.java:99)
at 
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.getMultipartOpenKey(S3MultipartUploadAbortRequest.java:256)
at 
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.validateAndUpdateCache(S3MultipartUploadAbortRequest.java:145)
... 9 more
```

This issue is similar as the issue 
[HDDS-10630](https://issues.apache.org/jira/browse/HDDS-10630). We should bring 
the same similar fix here. Without this, all these dangling MPU cannot be 
cleaned up either manually or the background cleanup service.

Also we are not sure what the root cause for these missing parent directories. 
Need some investigation. 


> parent directory not found when abort multi-part upload
> -------------------------------------------------------
>
>                 Key: HDDS-11784
>                 URL: https://issues.apache.org/jira/browse/HDDS-11784
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: S3
>    Affects Versions: 1.4.0
>            Reporter: Shawn
>            Priority: Major
>
> We observed lots of open key (files) in our FSO enabled ozone cluster. And 
> these are all incomplete MPU keys.
> When I tried to abort MPU by using s3 cli as below, I got the exception 
> complaining about the parent directory is not found.
> {code:bash}
> aws s3api abort-multipart-upload --endpoint 'xxxx' --bucket 
> '2e76bd0f-9682-42c6-a5ce-3e32c5aa37b2' --key 
> 'CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/ALULA_DICKINSON_ORIGINAL_101_EPISODE_DVSAUDIO_EN8CH_DOWNLOAD_FINAL_VDKSN0560101.mov'
>  --upload-id '4103c881-24fa-4992-b7b2-5474f8a7fbaf-113204926929050074'
> An error occurred (NoSuchUpload) when calling the AbortMultipartUpload 
> operation: The specified multipart upload does not exist. The upload ID might 
> be invalid, or the multipart upload might have been aborted or completed.
> {code}
> Exceptions in the log
> {code}
> NO_SUCH_MULTIPART_UPLOAD_ERROR 
> org.apache.hadoop.ozone.om.exceptions.OMException: Abort Multipart Upload 
> Failed: volume: s3v, bucket: 2e76bd0f-9682-42c6-a5ce-3e32c5aa37b2, key: 
> CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/ALULA_DICKINSON_ORIGINAL_101_EPISODE_DVSAUDIO_EN8CH_DOWNLOAD_FINAL_VDKSN0560101.mov
> at 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.validateAndUpdateCache(S3MultipartUploadAbortRequest.java:148)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.lambda$0(OzoneManagerRequestHandler.java:402)
> at org.apache.hadoop.util.MetricUtil.captureLatencyNs(MetricUtil.java:39)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:398)
> at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:587)
> at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:375)
> at 
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at java.base/java.lang.Thread.run(Thread.java:833)
> Caused by: DIRECTORY_NOT_FOUND 
> org.apache.hadoop.ozone.om.exceptions.OMException: Failed to find parent 
> directory of 
> CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/ALULA_DICKINSON_ORIGINAL_101_EPISODE_DVSAUDIO_EN8CH_DOWNLOAD_FINAL_VDKSN0560101.mov
> at 
> org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:1038)
> at 
> org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:988)
> at 
> org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils.getMultipartOpenKeyFSO(OMMultipartUploadUtils.java:122)
> at 
> org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils.getMultipartOpenKey(OMMultipartUploadUtils.java:99)
> at 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.getMultipartOpenKey(S3MultipartUploadAbortRequest.java:256)
> at 
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.validateAndUpdateCache(S3MultipartUploadAbortRequest.java:145)
> ... 9 more
> {code}
> This issue is similar as the issue 
> [HDDS-10630](https://issues.apache.org/jira/browse/HDDS-10630). We should 
> bring the same similar fix here. Without this, all these dangling MPU cannot 
> be cleaned up either manually or the background cleanup service.
> Also we are not sure what the root cause for these missing parent 
> directories. Need some investigation. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to