SaketaChalamchala opened a new pull request, #6496: URL: https://github.com/apache/ozone/pull/6496
## What changes were proposed in this pull request? S3a provides multiple mapreduce [committers](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/committers.html). When using the directory staging committer `fs.s3a.committer.name=directory` with replace conflict mode `fs.s3a.committer.staging.conflict-mode=replace` and writing to FSO buckets, the job fails with the following errors. This is because the logic to add back missing parent directories and missing output prefix directories similar to Initiate MPU request ([S3InitiateMultipartUploadRequestWithFSO.java](https://github.com/apache/ozone/blob/83d75861b0266160b219acde72d769eb0f9d5ac4/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/s3/multipart/S3InitiateMultipartUploadRequestWithFSO.java#L192-L207), [S3InitiateMultipartUploadResponseWithFSO.java](https://github.com/apache/ozone/blob/83d75861b0266160b219acde72d769eb0f9d5ac4/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/response/s3/multipart/S3InitiateMultipartUploadResponseWithFSO.java#L83-L107)) is missing in the Complete MPU request. The proposed solution adds the missing parent and output prefix directories back to DB before completing the MPU. Errors: ``` ##When creating getDBOzoneKey StateMachine ApplyTransaction Thread - 0]-org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest: MultipartUpload Complete request failed for Key: st-data-con-jqmmwt/qetest/terasort/output-1710494153/part-r-00000 in Volume/Bucket s3v/qe-dataconn-bucket DIRECTORY_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Failed to find parent directory of st-data-con-jqmmwt/qetest/terasort/output-1710494153/part-r-00000 at org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:1008) at org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:958) at org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentId(OMFileRequest.java:1038) at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequestWithFSO.getDBOzoneKey(S3MultipartUploadCompleteRequestWithFSO.java:114) at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest.validateAndUpdateCache(S3MultipartUploadCompleteRequest.java:157) at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:378) at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:568) at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:363) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) ``` ``` ## When getting omKeyInfo from DB StateMachine ApplyTransaction Thread - 0]-org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine: Terminating with exit status 1: Request cmdType: CompleteMultiPartUpload traceID: "" clientId: "client-5168AA460706" userInfo { userName: "xxx" remoteAddress: "10.140.142.67" hostName: "ccycloud-5.quasar-zycyup.root.comops.site" } version: 3 completeMultiPartUploadRequest { keyArgs { volumeName: "s3v" bucketName: "qe-dataconn-bucket" keyName: "st-data-con-jqmmwt/qetest/terasort/output-1710494153/part-r-00000" multipartUploadID: "e1c08d2c-5798-4c81-ab2c-7a0bd0fea4c9-112101950169613319" acls { type: USER name: "[email protected]" rights: "\200" aclScope: ACCESS } acls { type: GROUP name: "hrt_qa" rights: "\000\001" aclScope: ACCESS } acls { type: GROUP name: "users" rights: "\000\001" aclScope: ACCESS } acls { type: GROUP name: "hivetest" rights: "\000\001" aclScope: ACCESS } modificationTime: 1710540186541 } partsList { partNumber: 1 partName: "/s3v/qe-dataconn-bucket/st-data-con-jqmmwt/qetest/terasort/output-1710494153/part-r-00000-e1c08d2c-5798-4c81-ab2c-7a0bd0fea4c9-112101950169613319-1" } } s3Authentication { stringToSign: "AWS4-HMAC-SHA256\n20240315T220306Z\n20240315/us-east-1/s3/aws4_request\nb99fbcaca83fd2e3e67e9d5ff8f83fe1d882107fba9398e24414391522fbd926" signature: "724343f896d39b9e644f354ac4bf19648f2b2bc5ceeb514570abc288ed38d6c8" accessId: "[email protected]" } failed with exception java.lang.NullPointerException at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest.getOmKeyInfo(S3MultipartUploadCompleteRequest.java:378) at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest.validateAndUpdateCache(S3MultipartUploadCompleteRequest.java:202) at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:378) at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:568) at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:363) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) ``` ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-10630 ## How was this patch tested? Ran the s3a hadoop contract test `ITestS3ACommitterMRJob` to verify. There is another [PR](https://github.com/apache/ozone/pull/6458/files) out to add s3a contract tests to acceptance testing. ``` ## Startup unsecure ozone cluster using docker-compose and create an FSO bucket cd hadoop-ozone/dist/target/ozone-*-SNAPSHOT/compose/ozone docker-compose up -d --scale datanode=3 ozone sh bucket create /s3v/fso-bucket -l FILE_SYSTEM_OPTIMIZED ## Download the hadoop-aws source curl -LSs -o "hadoop-src.tar.gz" https://archive.apache.org/dist/hadoop/common/hadoop-3.3.6/hadoop-3.3.6-src.tar.gz tar -x -z -C "hadoop-src" --strip-components=3 -f "hadoop-src.tar.gz" 'hadoop-*-src/hadoop-tools/hadoop-aws' ## Create auth-keys.xml vi hadoop-src/src/test/resources/auth-keys.xml <configuration> <property> <name>fs.s3a.endpoint</name> <value>http://localhost:9878</value> </property> <property> <name>fs.s3a.access.key</name> <value>s3a-contract</value> </property> <property> <name>fs.s3a.secret.key</name> <value>unsecure</value> </property> <property> <name>fs.s3a.committer.staging.conflict-mode</name> <value>replace</value> </property> <property> <name>fs.contract.test.fs.s3a</name> <value>s3a://fso-bucket/</value> </property> <property> <name>test.fs.s3a.name</name> <value>s3a://fso-bucket/</value> </property> <property> <name>test.fs.s3a.sts.enabled</name> <value>false</value> </property> <property> <name>fs.s3a.path.style.access</name> <value>true</value> </property> <property> <name>fs.s3a.directory.marker.retention</name> <value>keep</value> </property> </configuration> ## Run the ITestS3ACommitterMRJob contract test mvn clean test -B -V --no-transfer-progress -Dtest='ITestS3ACommitterMRJob' ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
