[
https://issues.apache.org/jira/browse/HADOOP-13887?focusedWorklogId=597866&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597866
]
ASF GitHub Bot logged work on HADOOP-13887:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 17/May/21 18:05
Start Date: 17/May/21 18:05
Worklog Time Spent: 10m
Work Description: bogthe commented on pull request #2706:
URL: https://github.com/apache/hadoop/pull/2706#issuecomment-841812391
> Had merge conflicts so had to force push.
> Tests:
>
> ```
> [ERROR] Tests run: 1430, Failures: 1, Errors: 34, Skipped: 538
> ```
>
> Scale:
>
> ```
> [ERROR] Tests run: 151, Failures: 3, Errors: 21, Skipped: 29
> ```
>
> Most errors are MultiPart upload related:
>
> ```
> com.amazonaws.SdkClientException: Invalid part size: part sizes for
encrypted multipart uploads must be multiples of the cipher block size (16)
with the exception of the last part.
> ```
>
> Simply adding 16(Padding length) to multipart upload block size won't
work. The part sizes need to be a multiple of 16, so it has that restriction
for CSE. Also, one more thing to note here is that it assumes the last part to
be an exception, which makes me believe that multipart upload in CSE has to be
sequential(or can we parallel upload the starting parts and then upload the
last part?)? So, potentially another constraint while uploading could have
performance impacts here apart from the HEAD calls being required while
downloading/listing.
> @steveloughran
Hi @mehakmeet , regarding multipart uploads. The last part is always an
exception with regular multi part uploads too! You can do parallel uploads and
even upload the last part first and it would still work (for regular
multi-part). My assumption is that for multi part uploads with CSE enabled the
same functionality holds (except for cipher block size, but the minimum part
size for regular multi-part is 5MB = 5 * 1024 * 1024 which is still a multiple
of 16 :D ).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 597866)
Time Spent: 2h 50m (was: 2h 40m)
> Encrypt S3A data client-side with AWS SDK (S3-CSE)
> --------------------------------------------------
>
> Key: HADOOP-13887
> URL: https://issues.apache.org/jira/browse/HADOOP-13887
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.8.0
> Reporter: Jeeyoung Kim
> Assignee: Igor Mazur
> Priority: Minor
> Labels: pull-request-available
> Attachments: HADOOP-13887-002.patch, HADOOP-13887-007.patch,
> HADOOP-13887-branch-2-003.patch, HADOOP-13897-branch-2-004.patch,
> HADOOP-13897-branch-2-005.patch, HADOOP-13897-branch-2-006.patch,
> HADOOP-13897-branch-2-008.patch, HADOOP-13897-branch-2-009.patch,
> HADOOP-13897-branch-2-010.patch, HADOOP-13897-branch-2-012.patch,
> HADOOP-13897-branch-2-014.patch, HADOOP-13897-trunk-011.patch,
> HADOOP-13897-trunk-013.patch, HADOOP-14171-001.patch, S3-CSE Proposal.pdf
>
> Time Spent: 2h 50m
> Remaining Estimate: 0h
>
> Expose the client-side encryption option documented in Amazon S3
> documentation -
> http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingClientSideEncryption.html
> Currently this is not exposed in Hadoop but it is exposed as an option in AWS
> Java SDK, which Hadoop currently includes. It should be trivial to propagate
> this as a parameter passed to the S3client used in S3AFileSystem.java
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]