[ 
https://issues.apache.org/jira/browse/HADOOP-13076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265527#comment-15265527
 ] 

Chris Nauroth commented on HADOOP-13076:
----------------------------------------

bq. Chris: I believe the MD5 checks are in the SDK download automatically, at 
least from my cursory review of the AWS code on github;

[[email protected]], I just checked out that code, and I think you're right.

bq. there's nothing equivalent in upload

I think I found MD5 verification for uploads too.  It's kind of clever about it 
too.  If the caller supplied an MD5, then it will go ahead and send the 
Content-MD5 header.  If not, then it will do the MD5 calculation internally for 
the caller, but it won't actually send the Content-MD5 header.  That would 
imply needing to buffer the entire content and finish the MD5 calculation 
before sending the HTTP request headers.  Instead, it checks the ETag returned 
by the S3 service in the HTTP response, so it doesn't need to use an 
inefficient buffering strategy.  I found this logic for both plain put and 
multi-part upload.

https://github.com/aws/aws-sdk-java/blob/1.10.6/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L1385-L1505

https://github.com/aws/aws-sdk-java/blob/1.10.6/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L2893-L2944

Those links are for the 1.10.6 code, which is our dependency as of Apache 
Hadoop 2.8.0.  Here are the equivalent links for the 1.7.4 code, which is our 
dependency in Apache Hadoop versions prior to 2.8.0.

https://github.com/aws/aws-sdk-java/blob/1.7.4.x/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L1363-L1419

https://github.com/aws/aws-sdk-java/blob/1.7.4.x/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L2678-L2723

Based on this, I think we can close this as "Not a Problem".  I'll give it a 
few more days before closing in case anyone else wants to comment.

> S3A does not perform MD5 verification on stored objects.
> --------------------------------------------------------
>
>                 Key: HADOOP-13076
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13076
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>            Reporter: Chris Nauroth
>
> S3N supports end-to-end checksum verification for stored objects by passing 
> the MD5.  This feature is missing from S3A.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to