[ 
https://issues.apache.org/jira/browse/HADOOP-17500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17273516#comment-17273516
 ] 

Steve Loughran commented on HADOOP-17500:
-----------------------------------------

OK, so it needs to be set on every block we upload in a multipart? As well as a 
single part in a PUT?

The MD5 sum would need to be created in S3ADataBlocks as the data is written to 
buffer/disk (so we can avoid rereading it later to calculate the header at the 
start of the request). Not possible for the MultipartUpload API as that is just 
handed a byte array. It'd take a two pass scan. For that reason we should think 
about whether or not to make optional. 

Mandatory: nice way to verify corruption of upload stream; incremental 
calculation of normal writes. But: two passes needed for MultipartUploader API. 
Optional: one more setting to turn on, test, document...

I'd go for mandatory unless there was a good reason to ever turn it off, e.g. 
third party IO.

Now, do you plan to provide a patch here? If so, java.security.MessageDigest 
has the incremental API we need, 
org.apache.hadoop.fs.impl.AbstractMultipartUploader where we use it in testing 
today.

I can help with test design. hadoop 3.3+ lets you at all of the object headers 
during upload -would it be enough to upload a file and then verify the results? 
Or maybe add an invalid header and verify that triggers a failure



> S3A doesn't calculate Content-MD5 on uploads
> --------------------------------------------
>
>                 Key: HADOOP-17500
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17500
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>            Reporter: Pedro Tôrres
>            Priority: Major
>
> Hadoop doesn't specify the Content-MD5 of an object when uploading it to an 
> S3 Bucket. This prevents uploads to buckets with Object Lock, that require 
> the Content-MD5 to be specified.
>  
> {code:java}
> com.amazonaws.services.s3.model.AmazonS3Exception: Content-MD5 HTTP header is 
> required for Put Part requests with Object Lock parameters (Service: Amazon 
> S3; Status Code: 400; Error Code: InvalidRequest; Request ID: 
> ****************; S3 Extended Request ID: 
> ****************************************************************************; 
> Proxy: null), S3 Extended Request ID: 
> ****************************************************************************
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
>       at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
>       at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
>       at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5248)
>       at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5195)
>       at 
> com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3768)
>       at 
> com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3753)
>       at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.uploadPart(S3AFileSystem.java:2230)
>       at 
> org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$uploadPart$8(WriteOperationHelper.java:558)
>       at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:110)
>       ... 15 more{code}
>  
> Similar to https://issues.apache.org/jira/browse/JCLOUDS-1549
> Related to https://issues.apache.org/jira/browse/HADOOP-13076



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to