sokui opened a new pull request, #7700: URL: https://github.com/apache/ozone/pull/7700
## What changes were proposed in this pull request? HDDS-11784 adding missing parent directories for MPU abort and expired abort request We observed lots of open key (files) in our FSO enabled ozone cluster. And these are all incomplete MPU keys. When I tried to abort MPU by using s3 cli as below, I got the exception complaining about the parent directory is not found. ``` aws s3api abort-multipart-upload --endpoint 'xxxx' --bucket '2e76bd0f-9682-42c6-a5ce-3e32c5aa37b2' --key 'CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/abc.blob' --upload-id '4103c881-24fa-4992-b7b2-5474f8a7fbaf-113204926929050074' An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: The specified multipart upload does not exist. The upload ID might be invalid, or the multipart upload might have been aborted or completed. ``` Exceptions in the log ``` NO_SUCH_MULTIPART_UPLOAD_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Abort Multipart Upload Failed: volume: s3v, bucket: 2e76bd0f-9682-42c6-a5ce-3e32c5aa37b2, key: CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/abc.blob at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.validateAndUpdateCache(S3MultipartUploadAbortRequest.java:148) at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.lambda$0(OzoneManagerRequestHandler.java:402) at org.apache.hadoop.util.MetricUtil.captureLatencyNs(MetricUtil.java:39) at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:398) at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:587) at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:375) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:833) Caused by: DIRECTORY_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Failed to find parent directory of CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/abc.blob at org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:1038) at org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:988) at org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils.getMultipartOpenKeyFSO(OMMultipartUploadUtils.java:122) at org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils.getMultipartOpenKey(OMMultipartUploadUtils.java:99) at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.getMultipartOpenKey(S3MultipartUploadAbortRequest.java:256) at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.validateAndUpdateCache(S3MultipartUploadAbortRequest.java:145) ... 9 more ``` This issue is similar as the issue HDDS-10630. This PR brings the same mechanism which adds the missing parent directories to the table cache before we abort the MPU. The PR adds this mechanism to both `S3ExpiredMultipartUploadsAbortRequest` and `S3ExpiredMultipartUploadsAbortRequest`. Also the code is refactored so multiple places reuse the same mechanism ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-11784 ## How was this patch tested? It tested by the CI. Also we validate it in our cluster (on going). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
