[
https://issues.apache.org/jira/browse/HADOOP-18298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran updated HADOOP-18298:
------------------------------------
Environment: minio
Summary: Hadoop AWS | Staging committer Multipartupload not completing
on minio (was: Hadoop AWS | Staging committer Multipartupload not implemented
properly)
> Hadoop AWS | Staging committer Multipartupload not completing on minio
> ----------------------------------------------------------------------
>
> Key: HADOOP-18298
> URL: https://issues.apache.org/jira/browse/HADOOP-18298
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 3.3.1
> Environment: minio
> Reporter: Ayush Goyal
> Priority: Major
>
> In Hadoop aws staging
> committer(org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter),
> Committer uploads files from local to s3(method- commitTaskInternal) which
> calls uploadFileToPendingCommit of CommitOperation to upload file using
> multipart upload.
>
> Multipart upload consists of three steps-
> 1)Initialise multipartupload.
> 2) Breaks the file to part and upload Parts.
> 3) Merge all the parts of files and finalize multipart.
>
> In the implementation of uploadFileToPendingCommit, first 2 steps are
> implemented. However, 3rd part is missing which leads to uploading the parts
> file but because it is not merged at the end of job no files are there in
> destination directory.
>
> S3 logs before implement 3rd steps-
>
> {code:java}
> 2022-05-30T13:49:31:000 [200 OK] s3.NewMultipartUpload
> localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/part-00000-ce0a965f-622a-4950-bb4b-550470883134-c000-b552fb34-6156-4aa8-9085-679ad14fab6e.snappy.parquet?uploads
> 240b:c1d1:123:664f:c5d2:2:: 8.677ms ↑ 137 B ↓ 724 B
> 2022-05-30T13:49:31:000 [200 OK] s3.PutObjectPart
> localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/part-00000-ce0a965f-622a-4950-bb4b-550470883134-c000-b552fb34-6156-4aa8-9085-679ad14fab6e.snappy.parquet?uploadId=f3beae8e-3001-48be-9bc4-306b71940e50&partNumber=1
> 240b:c1d1:123:664f:c5d2:2:: 443.156ms ↑ 51 KiB ↓ 325 B
> 2022-05-30T13:49:32:000 [200 OK] s3.ListObjectsV2
> localhost:9000/minio-feature-testing/?list-type=2&delimiter=%2F&max-keys=2&prefix=spark-job%2Fprocessed%2Foutput-parquet-staging-7%2F_SUCCESS%2F&fetch-owner=false
> 240b:c1d1:123:664f:c5d2:2:: 3.414ms ↑ 137 B ↓ 646 B
> 2022-05-30T13:49:32:000 [200 OK] s3.PutObject
> localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/_SUCCESS
> 240b:c1d1:123:664f:c5d2:2:: 52.734ms ↑ 8.7 KiB ↓ 380 B
> 2022-05-30T13:49:32:000 [200 OK] s3.DeleteMultipleObjects
> localhost:9000/minio-feature-testing/?delete 240b:c1d1:123:664f:c5d2:2::
> 73.954ms ↑ 350 B ↓ 432 B
> 2022-05-30T13:49:32:000 [404 Not Found] s3.HeadObject
> localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/_temporary
> 240b:c1d1:123:664f:c5d2:2:: 2.658ms ↑ 137 B ↓ 291 B
> 2022-05-30T13:49:32:000 [200 OK] s3.ListObjectsV2
> localhost:9000/minio-feature-testing/?list-type=2&delimiter=%2F&max-keys=2&prefix=spark-job%2Fprocessed%2Foutput-parquet-staging-7%2F_temporary%2F&fetch-owner=false
> 240b:c1d1:123:664f:c5d2:2:: 4.807ms ↑ 137 B ↓ 648 B
> 2022-05-30T13:49:32:000 [200 OK] s3.ListMultipartUploads
> localhost:9000/minio-feature-testing/?uploads&prefix=spark-job%2Fprocessed%2Foutput-parquet-staging-7%2F
> 240b:c0e0:102:553e:b4c2:2:: 1.081ms ↑ 137 B ↓ 776 B
> 2022-05-30T13:49:32:000 [404 Not Found] s3.HeadObject
> localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/.spark-staging-ce0a965f-622a-4950-bb4b-550470883134
> 240b:c1d1:123:664f:c5d2:2:: 5.68ms ↑ 137 B ↓ 291 B
> 2022-05-30T13:49:32:000 [200 OK] s3.ListObjectsV2
> localhost:9000/minio-feature-testing/?list-type=2&delimiter=%2F&max-keys=2&prefix=spark-job%2Fprocessed%2Foutput-parquet-staging-7%2F.spark-staging-ce0a965f-622a-4950-bb4b-550470883134%2F&fetch-owner=false
> 240b:c1d1:123:664f:c5d2:2:: 2.452ms ↑ 137 B ↓ 689 B
> {code}
> Here , After s3.PutObjectPart there is no completeMultipartupload call for
> 3rd step.
>
> S3 logs after implement 3rd steps-
>
> {code:java}
> 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload
> localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
> 240b:c1d1:123:664f:c5d2:2:: 9.116ms ↑ 137 B ↓ 750 B
> 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload
> localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
> 240b:c1d1:123:664f:c5d2:2:: 9.416ms ↑ 137 B ↓ 751 B
> 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload
> localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
> 240b:c1d1:123:664f:c5d2:2:: 8.506ms ↑ 137 B ↓ 751 B
> 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload
> localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
> 240b:c1d1:123:664f:c5d2:2:: 9.815ms ↑ 137 B ↓ 750 B
> 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload
> localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D30/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
> 240b:c1d1:123:664f:c5d2:2:: 10.09ms ↑ 137 B ↓ 751 B
> 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload
> localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
> 240b:c1d1:123:664f:c5d2:2:: 9.851ms ↑ 137 B ↓ 751 B
> 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload
> localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D17/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
> 240b:c1d1:123:664f:c5d2:2:: 9.006ms ↑ 137 B ↓ 750 B
> 2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload
> localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
> 240b:c1d1:123:664f:c5d2:2:: 9.217ms ↑ 137 B ↓ 751 B
> 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart
> localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=7da87f0a-f8ff-4f9c-b877-b2fdd18d3c5f&partNumber=1
> 240b:c1d1:123:664f:c5d2:2:: 817.474ms ↑ 52 KiB ↓ 325 B
> 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart
> localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=782769d0-43f1-43b8-aae0-54ac4c8c6603&partNumber=1
> 240b:c1d1:123:664f:c5d2:2:: 818.363ms ↑ 85 KiB ↓ 325 B
> 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart
> localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D17/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=2c509073-e2b6-4d0a-a65a-bb4f154a432c&partNumber=1
> 240b:c1d1:123:664f:c5d2:2:: 819.765ms ↑ 54 KiB ↓ 325 B
> 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart
> localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=c7e09609-6193-4d41-bc05-4020291725e4&partNumber=1
> 240b:c1d1:123:664f:c5d2:2:: 818.782ms ↑ 55 KiB ↓ 325 B
> 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart
> localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=3bb4278e-455a-4dc4-af01-ed3227430590&partNumber=1
> 240b:c1d1:123:664f:c5d2:2:: 817.97ms ↑ 51 KiB ↓ 325 B
> 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart
> localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=8fe799e3-c712-43b7-a074-a2359232de07&partNumber=1
> 240b:c1d1:123:664f:c5d2:2:: 819.183ms ↑ 80 KiB ↓ 325 B
> 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart
> localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=c2e1477b-5457-4cbe-8fdb-4e80eaca63fe&partNumber=1
> 240b:c1d1:123:664f:c5d2:2:: 818.126ms ↑ 53 KiB ↓ 325 B
> 2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart
> localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D30/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=992167c8-fbde-4a0d-bd4d-5ce7ddd51a87&partNumber=1
> 240b:c1d1:123:664f:c5d2:2:: 818.176ms ↑ 56 KiB ↓ 325 B
> 2022-06-17T10:56:12:000 [200 OK] s3.CompleteMultipartUpload
> localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=7da87f0a-f8ff-4f9c-b877-b2fdd18d3c5f
> 240b:c1d1:123:664f:c5d2:2:: 632.761ms ↑ 272 B ↓ 1.1 KiB
> 2022-06-17T10:56:13:000 [200 OK] s3.NewMultipartUpload
> localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D17/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
> 240b:c1d1:123:664f:c5d2:2:: 6.231ms ↑ 137 B ↓ 751 B
> 2022-06-17T10:56:12:000 [200 OK] s3.CompleteMultipartUpload
> localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=3bb4278e-455a-4dc4-af01-ed3227430590
> 240b:c1d1:123:664f:c5d2:2:: 697.946ms ↑ 272 B ↓ 1.1 KiB
> 2022-06-17T10:56:12:000 [200 OK] s3.CompleteMultipartUpload
> localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D17/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=2c509073-e2b6-4d0a-a65a-bb4f154a432c
> 240b:c1d1:123:664f:c5d2:2:: 714.377ms ↑ 272 B ↓ 1.1 KiB
> {code}
>
>
> Needs to be implement -
>
> After uploadPart call and all upload id's are added to commitData,
> innerCommit should be called.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]