Ayush Goyal created HADOOP-18298:
------------------------------------
Summary: Hadoop AWS | Staging committer Multipartupload not
implemented properly
Key: HADOOP-18298
URL: https://issues.apache.org/jira/browse/HADOOP-18298
Project: Hadoop Common
Issue Type: Bug
Components: fs/s3
Affects Versions: 3.3.1
Reporter: Ayush Goyal
In Hadoop aws staging
committer(org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter), Committer
uploads files from local to s3(method- commitTaskInternal) which calls
uploadFileToPendingCommit of CommitOperation to upload file using multipart
upload.
Multipart upload consists of three steps-
1)Initialise multipartupload.
2) Breaks the file to part and upload Parts.
3) Merge all the parts of files and finalize multipart.
In the implementation of uploadFileToPendingCommit, first 2 steps are
implemented. However, 3rd part is missing which leads to uploading the parts
file but because it is not merged at the end of job no files are there in
destination directory.
S3 logs before implement 3rd steps-
{code:java}
2022-05-30T13:49:31:000 [200 OK] s3.NewMultipartUpload
localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/part-00000-ce0a965f-622a-4950-bb4b-550470883134-c000-b552fb34-6156-4aa8-9085-679ad14fab6e.snappy.parquet?uploads
240b:c1d1:123:664f:c5d2:2:: 8.677ms ↑ 137 B ↓ 724 B
2022-05-30T13:49:31:000 [200 OK] s3.PutObjectPart
localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/part-00000-ce0a965f-622a-4950-bb4b-550470883134-c000-b552fb34-6156-4aa8-9085-679ad14fab6e.snappy.parquet?uploadId=f3beae8e-3001-48be-9bc4-306b71940e50&partNumber=1
240b:c1d1:123:664f:c5d2:2:: 443.156ms ↑ 51 KiB ↓ 325 B
2022-05-30T13:49:32:000 [200 OK] s3.ListObjectsV2
localhost:9000/minio-feature-testing/?list-type=2&delimiter=%2F&max-keys=2&prefix=spark-job%2Fprocessed%2Foutput-parquet-staging-7%2F_SUCCESS%2F&fetch-owner=false
240b:c1d1:123:664f:c5d2:2:: 3.414ms ↑ 137 B ↓ 646 B
2022-05-30T13:49:32:000 [200 OK] s3.PutObject
localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/_SUCCESS
240b:c1d1:123:664f:c5d2:2:: 52.734ms ↑ 8.7 KiB ↓ 380 B
2022-05-30T13:49:32:000 [200 OK] s3.DeleteMultipleObjects
localhost:9000/minio-feature-testing/?delete 240b:c1d1:123:664f:c5d2:2::
73.954ms ↑ 350 B ↓ 432 B
2022-05-30T13:49:32:000 [404 Not Found] s3.HeadObject
localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/_temporary
240b:c1d1:123:664f:c5d2:2:: 2.658ms ↑ 137 B ↓ 291 B
2022-05-30T13:49:32:000 [200 OK] s3.ListObjectsV2
localhost:9000/minio-feature-testing/?list-type=2&delimiter=%2F&max-keys=2&prefix=spark-job%2Fprocessed%2Foutput-parquet-staging-7%2F_temporary%2F&fetch-owner=false
240b:c1d1:123:664f:c5d2:2:: 4.807ms ↑ 137 B ↓ 648 B
2022-05-30T13:49:32:000 [200 OK] s3.ListMultipartUploads
localhost:9000/minio-feature-testing/?uploads&prefix=spark-job%2Fprocessed%2Foutput-parquet-staging-7%2F
240b:c0e0:102:553e:b4c2:2:: 1.081ms ↑ 137 B ↓ 776 B
2022-05-30T13:49:32:000 [404 Not Found] s3.HeadObject
localhost:9000/minio-feature-testing/spark-job/processed/output-parquet-staging-7/.spark-staging-ce0a965f-622a-4950-bb4b-550470883134
240b:c1d1:123:664f:c5d2:2:: 5.68ms ↑ 137 B ↓ 291 B
2022-05-30T13:49:32:000 [200 OK] s3.ListObjectsV2
localhost:9000/minio-feature-testing/?list-type=2&delimiter=%2F&max-keys=2&prefix=spark-job%2Fprocessed%2Foutput-parquet-staging-7%2F.spark-staging-ce0a965f-622a-4950-bb4b-550470883134%2F&fetch-owner=false
240b:c1d1:123:664f:c5d2:2:: 2.452ms ↑ 137 B ↓ 689 B
{code}
Here , After s3.PutObjectPart there is no completeMultipartupload call for 3rd
step.
S3 logs after implement 3rd steps-
{code:java}
2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
240b:c1d1:123:664f:c5d2:2:: 9.116ms ↑ 137 B ↓ 750 B
2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
240b:c1d1:123:664f:c5d2:2:: 9.416ms ↑ 137 B ↓ 751 B
2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
240b:c1d1:123:664f:c5d2:2:: 8.506ms ↑ 137 B ↓ 751 B
2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
240b:c1d1:123:664f:c5d2:2:: 9.815ms ↑ 137 B ↓ 750 B
2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D30/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
240b:c1d1:123:664f:c5d2:2:: 10.09ms ↑ 137 B ↓ 751 B
2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
240b:c1d1:123:664f:c5d2:2:: 9.851ms ↑ 137 B ↓ 751 B
2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D17/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
240b:c1d1:123:664f:c5d2:2:: 9.006ms ↑ 137 B ↓ 750 B
2022-06-17T10:56:12:000 [200 OK] s3.NewMultipartUpload
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
240b:c1d1:123:664f:c5d2:2:: 9.217ms ↑ 137 B ↓ 751 B
2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=7da87f0a-f8ff-4f9c-b877-b2fdd18d3c5f&partNumber=1
240b:c1d1:123:664f:c5d2:2:: 817.474ms ↑ 52 KiB ↓ 325 B
2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=782769d0-43f1-43b8-aae0-54ac4c8c6603&partNumber=1
240b:c1d1:123:664f:c5d2:2:: 818.363ms ↑ 85 KiB ↓ 325 B
2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D17/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=2c509073-e2b6-4d0a-a65a-bb4f154a432c&partNumber=1
240b:c1d1:123:664f:c5d2:2:: 819.765ms ↑ 54 KiB ↓ 325 B
2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=c7e09609-6193-4d41-bc05-4020291725e4&partNumber=1
240b:c1d1:123:664f:c5d2:2:: 818.782ms ↑ 55 KiB ↓ 325 B
2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=3bb4278e-455a-4dc4-af01-ed3227430590&partNumber=1
240b:c1d1:123:664f:c5d2:2:: 817.97ms ↑ 51 KiB ↓ 325 B
2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=8fe799e3-c712-43b7-a074-a2359232de07&partNumber=1
240b:c1d1:123:664f:c5d2:2:: 819.183ms ↑ 80 KiB ↓ 325 B
2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=c2e1477b-5457-4cbe-8fdb-4e80eaca63fe&partNumber=1
240b:c1d1:123:664f:c5d2:2:: 818.126ms ↑ 53 KiB ↓ 325 B
2022-06-17T10:56:12:000 [200 OK] s3.PutObjectPart
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D30/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=992167c8-fbde-4a0d-bd4d-5ce7ddd51a87&partNumber=1
240b:c1d1:123:664f:c5d2:2:: 818.176ms ↑ 56 KiB ↓ 325 B
2022-06-17T10:56:12:000 [200 OK] s3.CompleteMultipartUpload
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D15/quarter%3D45/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=7da87f0a-f8ff-4f9c-b877-b2fdd18d3c5f
240b:c1d1:123:664f:c5d2:2:: 632.761ms ↑ 272 B ↓ 1.1 KiB
2022-06-17T10:56:13:000 [200 OK] s3.NewMultipartUpload
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D17/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploads
240b:c1d1:123:664f:c5d2:2:: 6.231ms ↑ 137 B ↓ 751 B
2022-06-17T10:56:12:000 [200 OK] s3.CompleteMultipartUpload
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D16/quarter%3D15/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=3bb4278e-455a-4dc4-af01-ed3227430590
240b:c1d1:123:664f:c5d2:2:: 697.946ms ↑ 272 B ↓ 1.1 KiB
2022-06-17T10:56:12:000 [200 OK] s3.CompleteMultipartUpload
localhost:9000/minio-feature-testing/spark-job/pm-processed/output-parquet-staging-39/day%3D23/hour%3D17/quarter%3D0/part-00004-d0b529ca-112f-43f2-a7dd-44de4db6aa7f-dffa7213-d492-48f9-9e6a-fb08bc81ceeb.c000.snappy.parquet?uploadId=2c509073-e2b6-4d0a-a65a-bb4f154a432c
240b:c1d1:123:664f:c5d2:2:: 714.377ms ↑ 272 B ↓ 1.1 KiB
{code}
Needs to be implement -
After uploadPart call and all upload id's are added to commitData, innerCommit
should be called.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]