[
https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380386#comment-16380386
]
Steve Loughran commented on HADOOP-15267:
-----------------------------------------
Thanks for finding this —I'll see if I can replicate it.
regarding your patch
# It'd be good to follow our naming convention of HADOOP-1234-001.patch; with
new patches going up by 1; yetus prefers this
# I'd prefer to see this in {{WriteOperationsHelper.newUploadPartRequest()}},
as that builds up the request...you can make {{generateSSECustomerKey()}}
package private to do this.
We need tests to stop regression.
I propose
* a subclass of ITestS3AHugeFilesDiskBlocks, preferaby
{{org.apache.hadoop.fs.s3a.scale.ITestS3AHugeFilesArrayBlocks.ITestS3AHugeFilesSSECDiskBlocks}}.
* whose configuration setup sets SSE-C and the key, as done in
{{ITestS3AEncryptionSSEC}}
* and in setup*(), after calling {{super.setup()) call
{{S3ATestUtils.skipIfEncryptionTestsDisabled(getConfiguration());}}
then if you run the hadoop aws test suite with the scale tests turned on, this
should do the test run. If you add the test before adding the fix, that will
show the test works, once the fix goes in, we can see the fix takes.
Thanks for starting this...if we can turn this around quickly then it can go
into 3.1
> S3A fails to store my data when multipart size is set ot 5 Mb and SSE-C
> encryption is enabled
> ---------------------------------------------------------------------------------------------
>
> Key: HADOOP-15267
> URL: https://issues.apache.org/jira/browse/HADOOP-15267
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 3.1.0
> Environment: Hadoop 3.1 Snapshot
> Reporter: Anis Elleuch
> Priority: Critical
> Attachments: hadoop-fix.patch
>
>
> When I enable SSE-C encryption in Hadoop 3.1 and set fs.s3a.multipart.size
> to 5 Mb, storing data in AWS doesn't work anymore. For example, running the
> following code:
> {code}
> >>> df1 = spark.read.json('/home/user/people.json')
> >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json")
> {code}
> shows the following exception:
> {code:java}
> com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload
> initiate requested encryption. Subsequent part requests must include the
> appropriate encryption parameters.
> {code}
> After some investigation, I discovered that hadoop-aws doesn't send SSE-C
> headers in Put Object Part as stated in AWS specification:
> [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html]
> {code:java}
> If you requested server-side encryption using a customer-provided encryption
> key in your initiate multipart upload request, you must provide identical
> encryption information in each part upload using the following headers.
> {code}
>
> You can find a patch attached to this issue for a better clarification of the
> problem.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]