[
https://issues.apache.org/jira/browse/HADOOP-17500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17273516#comment-17273516
]
Steve Loughran commented on HADOOP-17500:
-----------------------------------------
OK, so it needs to be set on every block we upload in a multipart? As well as a
single part in a PUT?
The MD5 sum would need to be created in S3ADataBlocks as the data is written to
buffer/disk (so we can avoid rereading it later to calculate the header at the
start of the request). Not possible for the MultipartUpload API as that is just
handed a byte array. It'd take a two pass scan. For that reason we should think
about whether or not to make optional.
Mandatory: nice way to verify corruption of upload stream; incremental
calculation of normal writes. But: two passes needed for MultipartUploader API.
Optional: one more setting to turn on, test, document...
I'd go for mandatory unless there was a good reason to ever turn it off, e.g.
third party IO.
Now, do you plan to provide a patch here? If so, java.security.MessageDigest
has the incremental API we need,
org.apache.hadoop.fs.impl.AbstractMultipartUploader where we use it in testing
today.
I can help with test design. hadoop 3.3+ lets you at all of the object headers
during upload -would it be enough to upload a file and then verify the results?
Or maybe add an invalid header and verify that triggers a failure
> S3A doesn't calculate Content-MD5 on uploads
> --------------------------------------------
>
> Key: HADOOP-17500
> URL: https://issues.apache.org/jira/browse/HADOOP-17500
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Reporter: Pedro Tôrres
> Priority: Major
>
> Hadoop doesn't specify the Content-MD5 of an object when uploading it to an
> S3 Bucket. This prevents uploads to buckets with Object Lock, that require
> the Content-MD5 to be specified.
>
> {code:java}
> com.amazonaws.services.s3.model.AmazonS3Exception: Content-MD5 HTTP header is
> required for Put Part requests with Object Lock parameters (Service: Amazon
> S3; Status Code: 400; Error Code: InvalidRequest; Request ID:
> ****************; S3 Extended Request ID:
> ****************************************************************************;
> Proxy: null), S3 Extended Request ID:
> ****************************************************************************
> at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819)
> at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403)
> at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372)
> at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
> at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
> at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
> at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
> at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
> at
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
> at
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
> at
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
> at
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5248)
> at
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5195)
> at
> com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3768)
> at
> com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3753)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.uploadPart(S3AFileSystem.java:2230)
> at
> org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$uploadPart$8(WriteOperationHelper.java:558)
> at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:110)
> ... 15 more{code}
>
> Similar to https://issues.apache.org/jira/browse/JCLOUDS-1549
> Related to https://issues.apache.org/jira/browse/HADOOP-13076
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]