[ 
https://issues.apache.org/jira/browse/HADOOP-19654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18036235#comment-18036235
 ] 

ASF GitHub Bot commented on HADOOP-19654:
-----------------------------------------

steveloughran commented on code in PR #7882:
URL: https://github.com/apache/hadoop/pull/7882#discussion_r2503587272


##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java:
##########
@@ -202,11 +205,34 @@ private <BuilderT extends S3BaseClientBuilder<BuilderT, 
ClientT>, ClientT> Build
 
     configureEndpointAndRegion(builder, parameters, conf);
 
+    // add a plugin to add a Content-MD5 header.
+    // this is required when performing some operations with third party stores
+    // (for example: bulk delete), and is somewhat harmless when working with 
AWS S3.
+    if (parameters.isMd5HeaderEnabled()) {
+      LOG.debug("MD5 header enabled");
+      builder.addPlugin(LegacyMd5Plugin.create());
+    }
+
+    //when to calculate request checksums.
+    final RequestChecksumCalculation checksumCalculation =
+        parameters.isChecksumCalculationEnabled()
+            ? RequestChecksumCalculation.WHEN_SUPPORTED

Review Comment:
   some operations require checksums (bulk delete?) and everything which 
implemented them has had to expect checksums. This new generation option, "when 
supported" is what broke things as it really means "generate checksums on all 
requests". There are only two values in the enum, so the sdk always has to 
choose one.
   
   when_supported
   * doesn't work for most third party stores
   * seems to break MPUs if you don't set a content checksum for put/posted 
data.
   
   I think having a generation "true/false" is simpler for people to understand 
than the nuances of when_supported vs when_required.
   





> Upgrade AWS SDK to 2.35.4
> -------------------------
>
>                 Key: HADOOP-19654
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19654
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build, fs/s3
>    Affects Versions: 3.5.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>
> Upgrade to a recent version of 2.33.x or later while off the critical path of 
> things.
> HADOOP-19485 froze the sdk at a version which worked with third party stores. 
> Apparently the new version works; early tests show that Bulk Delete calls 
> with third party stores complain about lack of md5 headers, so some tuning is 
> clearly going to be needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to