[
https://issues.apache.org/jira/browse/HADOOP-13887?focusedWorklogId=597428&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597428
]
ASF GitHub Bot logged work on HADOOP-13887:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 16/May/21 21:27
Start Date: 16/May/21 21:27
Worklog Time Spent: 10m
Work Description: mehakmeet commented on pull request #2706:
URL: https://github.com/apache/hadoop/pull/2706#issuecomment-841880071
Hey @bogthe,
> The last part is always an exception with regular multi part uploads too!
You can do parallel uploads and even upload the last part first and it would
still work (for regular multi-part).
Ah, I see, even I was thinking that with CSE it should still be able to find
which part was last since during the parts upload step we provide part numbers
and complete it in ascending order, But, I ran some tests with CSE enabled and
I was facing these issues:
AbstractContractMultipartUploaderTest#testMultipartUpload()
T1: partSize: 5242880bytes(5MB) + 1byte = 5242881 bytes
```
2021-05-17 02:43:43,998 [JUnit-testMultipartUpload] INFO
contract.AbstractContractMultipartUploaderTest (DurationInfo.java:<init>(77)) -
Starting: Put part 1 (size 5242881)
s3a://mehakmeet-singh-data/test/testMultipartUpload
2021-05-17 02:43:44,002 [s3a-transfer-shared-pool1-t2] INFO
s3a.WriteOperationHelper (WriteOperationHelper.java:operationRetried(146)) -
upload part #1 upload ID
cFOhefvaRWyUGkB_U6zV2Mhs8RMC3u55_WOASIRCRuv1hVIeGciyQkvs5lA7gvZrdb8W5mCGwSQLsGmg9K9QbsPP1lcBF30vEVaUwbyfq0PjBxehxEeHyMklZE8hhYo_
on test/testMultipartUpload: Retried 0:
org.apache.hadoop.fs.s3a.AWSClientIOException: upload part #1 upload ID
cFOhefvaRWyUGkB_U6zV2Mhs8RMC3u55_WOASIRCRuv1hVIeGciyQkvs5lA7gvZrdb8W5mCGwSQLsGmg9K9QbsPP1lcBF30vEVaUwbyfq0PjBxehxEeHyMklZE8hhYo_
on test/testMultipartUpload: com.amazonaws.SdkClientException: Invalid part
size: part sizes for encrypted multipart uploads must be multiples of the
cipher block size (16) with the exception of the last part.: Invalid part size:
part sizes for encrypted multipart uploads must be multiples of the cipher
block size (16) with the exception of the last part.
2021-05-17 02:43:44,824 [s3a-transfer-shared-pool1-t2] INFO
s3a.WriteOperationHelper (WriteOperationHelper.java:operationRetried(146)) -
upload part #1 upload ID
cFOhefvaRWyUGkB_U6zV2Mhs8RMC3u55_WOASIRCRuv1hVIeGciyQkvs5lA7gvZrdb8W5mCGwSQLsGmg9K9QbsPP1lcBF30vEVaUwbyfq0PjBxehxEeHyMklZE8hhYo_
on test/testMultipartUpload: Retried 1:
org.apache.hadoop.fs.s3a.AWSClientIOException: upload part #1 upload ID
cFOhefvaRWyUGkB_U6zV2Mhs8RMC3u55_WOASIRCRuv1hVIeGciyQkvs5lA7gvZrdb8W5mCGwSQLsGmg9K9QbsPP1lcBF30vEVaUwbyfq0PjBxehxEeHyMklZE8hhYo_
on test/testMultipartUpload: com.amazonaws.SdkClientException: Invalid part
size: part sizes for encrypted multipart uploads must be multiples of the
cipher block size (16) with the exception of the last part.: Invalid part size:
part sizes for encrypted multipart uploads must be multiples of the cipher
block size (16) with the exception of the last part.
2021-05-17 02:43:46,184 [s3a-transfer-shared-pool1-t2] INFO
s3a.WriteOperationHelper (WriteOperationHelper.java:operationRetried(146)) -
upload part #1 upload ID
cFOhefvaRWyUGkB_U6zV2Mhs8RMC3u55_WOASIRCRuv1hVIeGciyQkvs5lA7gvZrdb8W5mCGwSQLsGmg9K9QbsPP1lcBF30vEVaUwbyfq0PjBxehxEeHyMklZE8hhYo_
on test/testMultipartUpload: Retried 2:
org.apache.hadoop.fs.s3a.AWSClientIOException: upload part #1 upload ID
cFOhefvaRWyUGkB_U6zV2Mhs8RMC3u55_WOASIRCRuv1hVIeGciyQkvs5lA7gvZrdb8W5mCGwSQLsGmg9K9QbsPP1lcBF30vEVaUwbyfq0PjBxehxEeHyMklZE8hhYo_
on test/testMultipartUpload: com.amazonaws.SdkClientException: Invalid part
size: part sizes for encrypted multipart uploads must be multiples of the
cipher block size (16) with the exception of the last part.: Invalid part size:
part sizes for encrypted multipart uploads must be multiples of the cipher
block size (16) with the exception of the last part.
2021-05-17 02:43:50,537 [s3a-transfer-shared-pool1-t2] INFO
s3a.WriteOperationHelper (WriteOperationHelper.java:operationRetried(146)) -
upload part #1 upload ID
cFOhefvaRWyUGkB_U6zV2Mhs8RMC3u55_WOASIRCRuv1hVIeGciyQkvs5lA7gvZrdb8W5mCGwSQLsGmg9K9QbsPP1lcBF30vEVaUwbyfq0PjBxehxEeHyMklZE8hhYo_
on test/testMultipartUpload: Retried 3:
org.apache.hadoop.fs.s3a.AWSClientIOException: upload part #1 upload ID
cFOhefvaRWyUGkB_U6zV2Mhs8RMC3u55_WOASIRCRuv1hVIeGciyQkvs5lA7gvZrdb8W5mCGwSQLsGmg9K9QbsPP1lcBF30vEVaUwbyfq0PjBxehxEeHyMklZE8hhYo_
on test/testMultipartUpload: com.amazonaws.SdkClientException: Invalid part
size: part sizes for encrypted multipart uploads must be multiples of the
cipher block size (16) with the exception of the last part.: Invalid part size:
part sizes for encrypted multipart uploads must be multiples of the cipher
block size (16) with the exception of the last part.
2021-05-17 02:44:00,768 [s3a-transfer-shared-pool1-t2] INFO
s3a.WriteOperationHelper (WriteOperationHelper.java:operationRetried(146)) -
upload part #1 upload ID
cFOhefvaRWyUGkB_U6zV2Mhs8RMC3u55_WOASIRCRuv1hVIeGciyQkvs5lA7gvZrdb8W5mCGwSQLsGmg9K9QbsPP1lcBF30vEVaUwbyfq0PjBxehxEeHyMklZE8hhYo_
on test/testMultipartUpload: Retried 4:
org.apache.hadoop.fs.s3a.AWSClientIOException: upload part #1 upload ID
cFOhefvaRWyUGkB_U6zV2Mhs8RMC3u55_WOASIRCRuv1hVIeGciyQkvs5lA7gvZrdb8W5mCGwSQLsGmg9K9QbsPP1lcBF30vEVaUwbyfq0PjBxehxEeHyMklZE8hhYo_
on test/testMultipartUpload: com.amazonaws.SdkClientException: Invalid part
size: part sizes for encrypted multipart uploads must be multiples of the
cipher block size (16) with the exception of the last part.: Invalid part size:
part sizes for encrypted multipart uploads must be multiples of the cipher
block size (16) with the exception of the last part.
```
This retries a couple of times and fails with the exception:
```
org.apache.hadoop.fs.s3a.AWSClientIOException: upload part #1 upload ID
cFOhefvaRWyUGkB_U6zV2Mhs8RMC3u55_WOASIRCRuv1hVIeGciyQkvs5lA7gvZrdb8W5mCGwSQLsGmg9K9QbsPP1lcBF30vEVaUwbyfq0PjBxehxEeHyMklZE8hhYo_
on test/testMultipartUpload: com.amazonaws.SdkClientException: Invalid part
size: part sizes for encrypted multipart uploads must be multiples of the
cipher block size (16) with the exception of the last part.: Invalid part size:
part sizes for encrypted multipart uploads must be multiples of the cipher
block size (16) with the exception of the last part.
```
T2: partSize: 5242880bytes(5MB)
```
2021-05-17 02:46:22,270 [JUnit-testMultipartUpload] INFO
contract.AbstractContractMultipartUploaderTest (DurationInfo.java:<init>(77)) -
Starting: Put part 1 (size 5242880)
s3a://mehakmeet-singh-data/test/testMultipartUpload
2021-05-17 02:46:22,907 [JUnit-testMultipartUpload] INFO
contract.AbstractContractMultipartUploaderTest (DurationInfo.java:close(98)) -
Put part 1 (size 5242880) s3a://mehakmeet-singh-data/test/testMultipartUpload:
duration 0:00.637s
2021-05-17 02:46:22,910 [JUnit-testMultipartUpload] INFO
contract.ContractTestUtils (ContractTestUtils.java:end(1924)) - Duration of
Uploaded part 1: 637,220,364 nS
2021-05-17 02:46:22,911 [JUnit-testMultipartUpload] INFO
contract.AbstractContractMultipartUploaderTest
(AbstractContractMultipartUploaderTest.java:putPart(352)) - Upload bandwidth
7.846579 MB/s
2021-05-17 02:46:22,934 [JUnit-testMultipartUpload] INFO
contract.AbstractContractMultipartUploaderTest (DurationInfo.java:<init>(77)) -
Starting: Put part 2 (size 5242880)
s3a://mehakmeet-singh-data/test/testMultipartUpload
2021-05-17 02:46:23,254 [JUnit-testMultipartUpload] INFO
contract.AbstractContractMultipartUploaderTest (DurationInfo.java:close(98)) -
Put part 2 (size 5242880) s3a://mehakmeet-singh-data/test/testMultipartUpload:
duration 0:00.320s
2021-05-17 02:46:23,254 [JUnit-testMultipartUpload] INFO
contract.ContractTestUtils (ContractTestUtils.java:end(1924)) - Duration of
Uploaded part 2: 319,980,951 nS
2021-05-17 02:46:23,255 [JUnit-testMultipartUpload] INFO
contract.AbstractContractMultipartUploaderTest
(AbstractContractMultipartUploaderTest.java:putPart(352)) - Upload bandwidth
15.625930 MB/s
2021-05-17 02:46:23,275 [JUnit-testMultipartUpload] INFO
contract.AbstractContractMultipartUploaderTest (DurationInfo.java:<init>(77)) -
Starting: Put part 3 (size 5242880)
s3a://mehakmeet-singh-data/test/testMultipartUpload
2021-05-17 02:46:23,990 [JUnit-testMultipartUpload] INFO
contract.AbstractContractMultipartUploaderTest (DurationInfo.java:close(98)) -
Put part 3 (size 5242880) s3a://mehakmeet-singh-data/test/testMultipartUpload:
duration 0:00.715s
2021-05-17 02:46:23,990 [JUnit-testMultipartUpload] INFO
contract.ContractTestUtils (ContractTestUtils.java:end(1924)) - Duration of
Uploaded part 3: 715,353,661 nS
2021-05-17 02:46:23,990 [JUnit-testMultipartUpload] INFO
contract.AbstractContractMultipartUploaderTest
(AbstractContractMultipartUploaderTest.java:putPart(352)) - Upload bandwidth
6.989550 MB/s
2021-05-17 02:46:23,991 [JUnit-testMultipartUpload] INFO
contract.AbstractContractMultipartUploaderTest (DurationInfo.java:<init>(77)) -
Starting: Complete upload to s3a://mehakmeet-singh-data/test/testMultipartUpload
2021-05-17 02:47:49,055 [JUnit-testMultipartUpload] INFO
contract.AbstractContractMultipartUploaderTest (DurationInfo.java:close(98)) -
Complete upload to s3a://mehakmeet-singh-data/test/testMultipartUpload:
duration 1:25.064s
2021-05-17 02:47:49,056 [JUnit-testMultipartUpload] INFO
contract.AbstractContractMultipartUploaderTest (DurationInfo.java:<init>(77)) -
Starting: Abort upload to s3a://mehakmeet-singh-data/test/testMultipartUpload
2021-05-17 02:47:49,058 [s3a-transfer-shared-pool1-t6] INFO
s3a.S3AFileSystem (S3AFileSystem.java:abortMultipartUpload(4703)) - Aborting
multipart upload
l0UfFfsZXE8ogO8ojviT6D8iJo3oEM052apJu.txB1b5j1KPD4F8LQWWYHmOru4G1mu.uPPtGZhIYoT0P2S3g.k10ROOP7uXOiX7czPpmXzlA.67xB7YoN2_IczirQDL
to test/testMultipartUpload
```
Eventually fails with the exception:
```
org.apache.hadoop.fs.s3a.AWSClientIOException: Completing multipart upload
on test/testMultipartUpload: com.amazonaws.SdkClientException: Unable to
complete an encrypted multipart upload without being told which part was the
last. Without knowing which part was the last, the encrypted data in Amazon S3
is incomplete and corrupt.: Unable to complete an encrypted multipart upload
without being told which part was the last. Without knowing which part was the
last, the encrypted data in Amazon S3 is incomplete and corrupt.
```
Both of these passes without CSE.
So, basically, we have a restriction to use only multiple of 16 as
partSizes, even though min size of parts is 5MB and anything which is a
multiple of MB, would be a multiple of 16, but we can't set any custom
bytes(not multiple of 16) as partSize in CSE.
And, even after we set it to a multiple of 16, I am seeing the exception
regarding last part. So, is the logic of part numbers not applicable in CSE?
Maybe I am missing something here?
CC: @steveloughran
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 597428)
Time Spent: 2.5h (was: 2h 20m)
> Encrypt S3A data client-side with AWS SDK (S3-CSE)
> --------------------------------------------------
>
> Key: HADOOP-13887
> URL: https://issues.apache.org/jira/browse/HADOOP-13887
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.8.0
> Reporter: Jeeyoung Kim
> Assignee: Igor Mazur
> Priority: Minor
> Labels: pull-request-available
> Attachments: HADOOP-13887-002.patch, HADOOP-13887-007.patch,
> HADOOP-13887-branch-2-003.patch, HADOOP-13897-branch-2-004.patch,
> HADOOP-13897-branch-2-005.patch, HADOOP-13897-branch-2-006.patch,
> HADOOP-13897-branch-2-008.patch, HADOOP-13897-branch-2-009.patch,
> HADOOP-13897-branch-2-010.patch, HADOOP-13897-branch-2-012.patch,
> HADOOP-13897-branch-2-014.patch, HADOOP-13897-trunk-011.patch,
> HADOOP-13897-trunk-013.patch, HADOOP-14171-001.patch, S3-CSE Proposal.pdf
>
> Time Spent: 2.5h
> Remaining Estimate: 0h
>
> Expose the client-side encryption option documented in Amazon S3
> documentation -
> http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingClientSideEncryption.html
> Currently this is not exposed in Hadoop but it is exposed as an option in AWS
> Java SDK, which Hadoop currently includes. It should be trivial to propagate
> this as a parameter passed to the S3client used in S3AFileSystem.java
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]