[
https://issues.apache.org/jira/browse/HADOOP-15224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17929486#comment-17929486
]
ASF GitHub Bot commented on HADOOP-15224:
-----------------------------------------
raphaelazzolini commented on code in PR #7396:
URL: https://github.com/apache/hadoop/pull/7396#discussion_r1966895375
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java:
##########
@@ -1743,4 +1753,26 @@ static void maybeIsolateClassloader(Configuration conf,
ClassLoader classLoader)
}
}
+ /**
+ * Get the checksum algorithm to be used for data integrity check of the
objects in S3.
+ * This operation includes validating if the provided value is a supported
checksum algorithm.
+ * @param conf configuration to scan
+ * @return the checksum algorithm to be passed on S3 requests
+ * @throws IllegalArgumentException if the checksum algorithm is not known
or not supported
+ */
+ public static ChecksumAlgorithm getChecksumAlgorithm(Configuration conf) {
+ final String checksumAlgorithmString = conf.get(CHECKSUM_ALGORITHM);
+ if (StringUtils.isBlank(checksumAlgorithmString)) {
+ return null;
+ }
+ final ChecksumAlgorithm checksumAlgorithm =
+ ChecksumAlgorithm.fromValue(checksumAlgorithmString);
+ if (!SUPPORTED_CHECKSUM_ALGORITHMS.contains(checksumAlgorithm)) {
Review Comment:
I will use Precondition.checkArgument(), isn't it a better option in this
case?
> builld up md5 checksum as blocks are built in S3ABlockOutputStream; validate
> upload
> -----------------------------------------------------------------------------------
>
> Key: HADOOP-15224
> URL: https://issues.apache.org/jira/browse/HADOOP-15224
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.0.0
> Reporter: Steve Loughran
> Assignee: Raphael Azzolini
> Priority: Minor
> Labels: pull-request-available
>
> [~rdblue] reports sometimes he sees corrupt data on S3. Given MD5 checks from
> upload to S3, its likelier to have happened in VM RAM, HDD or nearby.
> If the MD5 checksum for each block was built up as data was written to it,
> and checked against the etag RAM/HDD storage of the saved blocks could be
> removed as sources of corruption
> The obvious place would be
> {{org.apache.hadoop.fs.s3a.S3ADataBlocks.DataBlock}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]