[ 
https://issues.apache.org/jira/browse/HADOOP-15224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17928112#comment-17928112
 ] 

ASF GitHub Bot commented on HADOOP-15224:
-----------------------------------------

steveloughran commented on code in PR #7396:
URL: https://github.com/apache/hadoop/pull/7396#discussion_r1959886998


##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/files/Etag.java:
##########
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.commit.files;
+
+import java.io.Serializable;
+import java.util.StringJoiner;
+
+import software.amazon.awssdk.services.s3.model.ChecksumAlgorithm;
+import software.amazon.awssdk.services.s3.model.CompletedPart;
+
+/**
+ * Stores ETag and checksum values from {@link  CompletedPart} responses from 
S3.
+ * These values need to be stored to be later passed to the
+ * {@link 
software.amazon.awssdk.services.s3.model.CompleteMultipartUploadRequest
+ * CompleteMultipartUploadRequest}
+ */
+public class Etag implements Serializable {

Review Comment:
   overused; needs a name like `UploadEtag`



##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md:
##########
@@ -387,6 +387,28 @@ Happens if a multipart upload is being completed, but one 
of the parts is missin
 * A magic committer job's list of in-progress uploads somehow got corrupted
 * Bug in the S3A codebase (rare, but not impossible...)
 
+### <a name="object_lock_parameters"></a> Status Code 400 "Content-MD5 OR 
x-amz-checksum- HTTP header is required for Put Object requests with Object 
Lock parameters"
+```
+software.amazon.awssdk.services.s3.model.S3Exception: Content-MD5 OR 
x-amz-checksum- HTTP header is required for Put Object requests with Object 
Lock parameters (Service: S3, Status Code: 400, Request ID: 1122334455, 
Extended Request ID: ...):

Review Comment:
   always good to get extra troubleshooting sections -appreciated



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java:
##########
@@ -1743,4 +1753,26 @@ static void maybeIsolateClassloader(Configuration conf, 
ClassLoader classLoader)
     }
   }
 
+  /**
+   * Get the checksum algorithm to be used for data integrity check of the 
objects in S3.
+   * This operation includes validating if the provided value is a supported 
checksum algorithm.
+   * @param conf configuration to scan
+   * @return the checksum algorithm to be passed on S3 requests
+   * @throws IllegalArgumentException if the checksum algorithm is not known 
or not supported
+   */
+  public static ChecksumAlgorithm getChecksumAlgorithm(Configuration conf) {
+    final String checksumAlgorithmString = conf.get(CHECKSUM_ALGORITHM);

Review Comment:
   getTrimmed()



##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/TestS3AMultipartUploaderSupport.java:
##########
@@ -41,35 +44,52 @@ public class TestS3AMultipartUploaderSupport extends 
HadoopTestBase {
 
   @Test
   public void testRoundTrip() throws Throwable {
-    PartHandlePayload result = roundTrip(999, "tag", 1);
+    PartHandlePayload result = roundTrip(999, "tag", 1, null, null);
     assertEquals(PATH, result.getPath());
     assertEquals(UPLOAD, result.getUploadId());
     assertEquals(999, result.getPartNumber());
     assertEquals("tag", result.getEtag());
     assertEquals(1, result.getLen());
+    assertNull(result.getChecksumAlgorithm());
+    assertNull(result.getChecksum());
   }
 
   @Test
   public void testRoundTrip2() throws Throwable {
     long len = 1L + Integer.MAX_VALUE;
     PartHandlePayload result =
-        roundTrip(1, "11223344", len);
+        roundTrip(1, "11223344", len, null, null);
     assertEquals(1, result.getPartNumber());
     assertEquals("11223344", result.getEtag());
     assertEquals(len, result.getLen());
+    assertNull(result.getChecksumAlgorithm());
+    assertNull(result.getChecksum());
+  }
+
+  @Test
+  public void testRoundTripWithChecksum() throws Throwable {

Review Comment:
   again, assertJ. yes, the class will be a mix, but that's OK



##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3ADeleteOnExit.java:
##########
@@ -89,6 +90,17 @@ public void testDeleteOnExit() throws Exception {
     assertEquals(0, testFs.getDeleteOnDnExitCount());
   }
 
+  @Test
+  public void testCreateRequestFactoryWithInvalidChecksumAlgorithm() throws 
Exception {
+    Configuration conf = createConfiguration();
+    conf.set(Constants.CHECKSUM_ALGORITHM, "INVALID");
+    TestS3AFileSystem testFs  = new TestS3AFileSystem();
+    URI uri = URI.create(FS_S3A + "://" + BUCKET);
+    final IllegalArgumentException exception = 
intercept(IllegalArgumentException.class,
+        () -> testFs.initialize(uri, conf));
+    assertEquals("Checksum algorithm is not supported: INVALID", 
exception.getMessage());

Review Comment:
   use AssertJ assertions in new tests...better assertions and makes the move 
to Junit5 easier



##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/TestS3AMultipartUploaderSupport.java:
##########
@@ -41,35 +44,52 @@ public class TestS3AMultipartUploaderSupport extends 
HadoopTestBase {
 
   @Test
   public void testRoundTrip() throws Throwable {
-    PartHandlePayload result = roundTrip(999, "tag", 1);
+    PartHandlePayload result = roundTrip(999, "tag", 1, null, null);
     assertEquals(PATH, result.getPath());
     assertEquals(UPLOAD, result.getUploadId());
     assertEquals(999, result.getPartNumber());
     assertEquals("tag", result.getEtag());
     assertEquals(1, result.getLen());
+    assertNull(result.getChecksumAlgorithm());

Review Comment:
   add as assertJ with descriptions of what is being checked



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java:
##########
@@ -1743,4 +1753,26 @@ static void maybeIsolateClassloader(Configuration conf, 
ClassLoader classLoader)
     }
   }
 
+  /**
+   * Get the checksum algorithm to be used for data integrity check of the 
objects in S3.
+   * This operation includes validating if the provided value is a supported 
checksum algorithm.
+   * @param conf configuration to scan
+   * @return the checksum algorithm to be passed on S3 requests
+   * @throws IllegalArgumentException if the checksum algorithm is not known 
or not supported
+   */
+  public static ChecksumAlgorithm getChecksumAlgorithm(Configuration conf) {
+    final String checksumAlgorithmString = conf.get(CHECKSUM_ALGORITHM);
+    if (StringUtils.isBlank(checksumAlgorithmString)) {
+      return null;
+    }
+    final ChecksumAlgorithm checksumAlgorithm =
+        ChecksumAlgorithm.fromValue(checksumAlgorithmString);
+    if (!SUPPORTED_CHECKSUM_ALGORITHMS.contains(checksumAlgorithm)) {

Review Comment:
   prefer Precondition.checkState()



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java:
##########
@@ -135,6 +139,12 @@ public final class S3AUtils {
 
   private static final String BUCKET_PATTERN = FS_S3A_BUCKET_PREFIX + "%s.%s";
 
+  public static final Set<ChecksumAlgorithm> SUPPORTED_CHECKSUM_ALGORITHMS = 
ImmutableSet.of(

Review Comment:
   let's move this into its own class `...s3a.impl.ChecksumSupport`. S3AUtils 
is too big and a source of cherrypick pain, so we prefer isolating things now





> builld up md5 checksum as blocks are built in S3ABlockOutputStream; validate 
> upload
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-15224
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15224
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0
>            Reporter: Steve Loughran
>            Assignee: Raphael Azzolini
>            Priority: Minor
>              Labels: pull-request-available
>
> [~rdblue] reports sometimes he sees corrupt data on S3. Given MD5 checks from 
> upload to S3, its likelier to have happened in VM RAM, HDD or nearby.
> If the MD5 checksum for each block was built up as data was written to it, 
> and checked against the etag RAM/HDD storage of the saved blocks could be 
> removed as sources of corruption
> The obvious place would be 
> {{org.apache.hadoop.fs.s3a.S3ADataBlocks.DataBlock}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to