[
https://issues.apache.org/jira/browse/HADOOP-15224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938181#comment-17938181
]
ASF GitHub Bot commented on HADOOP-15224:
-----------------------------------------
steveloughran commented on code in PR #7396:
URL: https://github.com/apache/hadoop/pull/7396#discussion_r2011894842
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/ChecksumSupport.java:
##########
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.impl;
+
+import java.util.Set;
+
+import org.apache.hadoop.thirdparty.com.google.common.collect.ImmutableSet;
+import software.amazon.awssdk.services.s3.model.ChecksumAlgorithm;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.conf.Configuration;
+
+import static org.apache.hadoop.fs.s3a.Constants.CHECKSUM_ALGORITHM;
+import static org.apache.hadoop.util.Preconditions.checkArgument;
+
+/**
+ * Utility class to support operations on S3 object checksum.
+ */
+public final class ChecksumSupport {
+
+ private ChecksumSupport() {
+ }
+
+ /**
+ * Checksum algorithms that are supported by S3A.
+ */
+ private static final Set<ChecksumAlgorithm> SUPPORTED_CHECKSUM_ALGORITHMS =
ImmutableSet.of(
+ ChecksumAlgorithm.CRC32,
+ ChecksumAlgorithm.CRC32_C,
+ ChecksumAlgorithm.SHA1,
+ ChecksumAlgorithm.SHA256);
+
+ /**
+ * Get the checksum algorithm to be used for data integrity check of the
objects in S3.
+ * This operation includes validating if the provided value is a supported
checksum algorithm.
+ * @param conf configuration to scan
+ * @return the checksum algorithm to be passed on S3 requests
+ * @throws IllegalArgumentException if the checksum algorithm is not known
or not supported
+ */
+ public static ChecksumAlgorithm getChecksumAlgorithm(Configuration conf) {
+ final String checksumAlgorithmString = conf.getTrimmed(CHECKSUM_ALGORITHM);
+ if (StringUtils.isBlank(checksumAlgorithmString)) {
+ return null;
+ }
+ final ChecksumAlgorithm checksumAlgorithm =
Review Comment:
use the (new) method `ConfigurationHelper.resolveEnum()` here; it does case
conversion and gives you a fallback. Trying to get the right case for a simple
enum config option is a real PITA
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AChecksum.java:
##########
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a;
+
+import java.io.IOException;
+
+import org.junit.Assert;
+import org.junit.Test;
Review Comment:
can you use assertJ assertions; we are moving to Junit5 and adopting Assertj
now avoids future pain. Plus is error messages are so good
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java:
##########
@@ -1734,6 +1734,15 @@ private Constants() {
*/
public static final boolean CHECKSUM_VALIDATION_DEFAULT = false;
+ /**
+ * Indicates the algorithm used to create the checksum for the object
+ * to be uploaded to S3. Unset by default. It supports the following values:
+ * 'CRC32', 'CRC32C', 'SHA1', and 'SHA256'
+ * value:{@value}
+ */
+ public static final String CHECKSUM_ALGORITHM =
Review Comment:
let's call this fs.s3a.create.checksum.algorithm, as there are lots of
fs.s3a.create. items and it makes clear that this is what the checksum is
exclusively for. We are having *a lot* of checksum problems with the 2.30 SDK
and I don't want confusion
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/files/TestUploadEtag.java:
##########
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.commit.files;
+
+import org.junit.jupiter.api.Test;
+import software.amazon.awssdk.services.s3.model.CompletedPart;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertNull;
+
+public class TestUploadEtag {
Review Comment:
can you use assertJ asserts, with descriptions on the assertNull() assertions
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AChecksum.java:
##########
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a;
+
+import java.io.IOException;
+
+import org.junit.Assert;
+import org.junit.Test;
+import software.amazon.awssdk.services.s3.model.ChecksumAlgorithm;
+import software.amazon.awssdk.services.s3.model.ChecksumMode;
+import software.amazon.awssdk.services.s3.model.HeadObjectRequest;
+import software.amazon.awssdk.services.s3.model.HeadObjectResponse;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.audit.S3AAuditConstants;
+import org.apache.hadoop.fs.s3a.impl.ChecksumSupport;
+
+import static org.apache.hadoop.fs.contract.ContractTestUtils.rm;
+import static org.apache.hadoop.fs.s3a.Constants.CHECKSUM_ALGORITHM;
+
+/**
+ * Tests S3 checksum algorithm.
+ * If CHECKSUM_ALGORITHM config is not set in auth-keys.xml,
+ * SHA256 algorithm will be picked.
+ */
+public class ITestS3AChecksum extends AbstractS3ATestBase {
+
+ private static final ChecksumAlgorithm DEFAULT_CHECKSUM_ALGORITHM =
ChecksumAlgorithm.SHA256;
+
+ private ChecksumAlgorithm checksumAlgorithm;
+
+ private static final int[] SIZES = {
+ 1, 2, 3, 4, 5, 254, 255, 256, 257, 2 ^ 12 - 1
+ };
+
+ @Override
+ protected Configuration createConfiguration() {
+ final Configuration conf = super.createConfiguration();
+ S3ATestUtils.disableFilesystemCaching(conf);
+ checksumAlgorithm = ChecksumSupport.getChecksumAlgorithm(conf);
+ if (checksumAlgorithm == null) {
+ checksumAlgorithm = DEFAULT_CHECKSUM_ALGORITHM;
+ LOG.info("No checksum algorithm found in configuration, will use default
{}",
+ checksumAlgorithm);
+ conf.set(CHECKSUM_ALGORITHM, checksumAlgorithm.toString());
+ }
+ conf.setBoolean(S3AAuditConstants.REJECT_OUT_OF_SPAN_OPERATIONS, false);
+ return conf;
+ }
+
+ @Test
+ public void testChecksum() throws IOException {
+ for (int size : SIZES) {
+ validateChecksumForFilesize(size);
+ }
+ }
+
+ private void validateChecksumForFilesize(int len) throws IOException {
+ describe("Create a file of size " + len);
+ String src = String.format("%s-%04x", methodName.getMethodName(), len);
+ Path path = writeThenReadFile(src, len);
+ assertChecksum(path);
+ rm(getFileSystem(), path, false, false);
+ }
+
+ private void assertChecksum(Path path) throws IOException {
+ final String key = getFileSystem().pathToKey(path);
+ HeadObjectRequest.Builder requestBuilder =
getFileSystem().getRequestFactory()
+ .newHeadObjectRequestBuilder(key)
+ .checksumMode(ChecksumMode.ENABLED);
+ HeadObjectResponse headObject = getFileSystem().getS3AInternals()
+ .getAmazonS3Client("Call head object with checksum enabled")
+ .headObject(requestBuilder.build());
+ switch (checksumAlgorithm) {
+ case CRC32:
+ Assert.assertNotNull(headObject.checksumCRC32());
Review Comment:
can we have text for each assertions, e
```
Assertions.assertThat(headObject.headObject.checksumCRC32C())
.describedAs("headObject.checksumCRC32C()")
.isNotNull())
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AChecksum.java:
##########
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a;
+
+import java.io.IOException;
+
+import org.junit.Assert;
+import org.junit.Test;
+import software.amazon.awssdk.services.s3.model.ChecksumAlgorithm;
+import software.amazon.awssdk.services.s3.model.ChecksumMode;
+import software.amazon.awssdk.services.s3.model.HeadObjectRequest;
+import software.amazon.awssdk.services.s3.model.HeadObjectResponse;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.audit.S3AAuditConstants;
+import org.apache.hadoop.fs.s3a.impl.ChecksumSupport;
+
+import static org.apache.hadoop.fs.contract.ContractTestUtils.rm;
+import static org.apache.hadoop.fs.s3a.Constants.CHECKSUM_ALGORITHM;
+
+/**
+ * Tests S3 checksum algorithm.
+ * If CHECKSUM_ALGORITHM config is not set in auth-keys.xml,
+ * SHA256 algorithm will be picked.
+ */
+public class ITestS3AChecksum extends AbstractS3ATestBase {
+
+ private static final ChecksumAlgorithm DEFAULT_CHECKSUM_ALGORITHM =
ChecksumAlgorithm.SHA256;
+
+ private ChecksumAlgorithm checksumAlgorithm;
+
+ private static final int[] SIZES = {
+ 1, 2, 3, 4, 5, 254, 255, 256, 257, 2 ^ 12 - 1
+ };
+
+ @Override
+ protected Configuration createConfiguration() {
+ final Configuration conf = super.createConfiguration();
+ S3ATestUtils.disableFilesystemCaching(conf);
+ checksumAlgorithm = ChecksumSupport.getChecksumAlgorithm(conf);
Review Comment:
add
```
S3ATestUtils.removeBaseAndBucketOverrides(conf, CHECKSUM_ALGORITHM,
REJECT_OUT_OF_SPAN_OPERATIONS)
```
needed to ensuer that bucket overrides don't cause test failures
> builld up md5 checksum as blocks are built in S3ABlockOutputStream; validate
> upload
> -----------------------------------------------------------------------------------
>
> Key: HADOOP-15224
> URL: https://issues.apache.org/jira/browse/HADOOP-15224
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.0.0
> Reporter: Steve Loughran
> Assignee: Raphael Azzolini
> Priority: Minor
> Labels: pull-request-available
>
> [~rdblue] reports sometimes he sees corrupt data on S3. Given MD5 checks from
> upload to S3, its likelier to have happened in VM RAM, HDD or nearby.
> If the MD5 checksum for each block was built up as data was written to it,
> and checked against the etag RAM/HDD storage of the saved blocks could be
> removed as sources of corruption
> The obvious place would be
> {{org.apache.hadoop.fs.s3a.S3ADataBlocks.DataBlock}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]