Anis Elleuch created HADOOP-15267: ------------------------------------- Summary: S3A fails to store my data when multipart size is set ot 5 Mb and SSE-C encryption is enabled Key: HADOOP-15267 URL: https://issues.apache.org/jira/browse/HADOOP-15267 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 3.1.0 Environment: Hadoop 3.1 Snapshot Reporter: Anis Elleuch Attachments: hadoop-fix.patch
With Spark with Hadoop 3.1.0, when I enable SSE-C encryption and set fs.s3a.multipart.size to 5 Mb, storing data in AWS won't work anymore. For example, running the following code: {code} >>> df1 = spark.read.json('/home/user/people.json') >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json") {code} shows the following exception: {code:java} com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload initiate requested encryption. Subsequent part requests must include the appropriate encryption parameters. {code} After some investigation, I discovered that hadoop-aws doesn't send SSE-C headers in Put Object Part as stated in AWS specification: [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html] {code:java} If you requested server-side encryption using a customer-provided encryption key in your initiate multipart upload request, you must provide identical encryption information in each part upload using the following headers. {code} You can find a patch attached to this issue for a better clarification of the problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org