[
https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767406#comment-17767406
]
ASF GitHub Bot commented on HADOOP-18910:
-----------------------------------------
snvijaya commented on code in PR #6069:
URL: https://github.com/apache/hadoop/pull/6069#discussion_r1332534729
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java:
##########
@@ -241,6 +241,9 @@ public final class ConfigurationKeys {
/** Add extra resilience to rename failures, at the expense of performance.
*/
public static final String FS_AZURE_ABFS_RENAME_RESILIENCE =
"fs.azure.enable.rename.resilience";
+ /** Add extra layer of verification of the integrity of the request content
during transport. */
+ public static final String FS_AZURE_ABFS_ENABLE_CHECKSUM_VALIDATION =
"fs.azure.enable.checksum.validation";
Review Comment:
Add documenation for the config in
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/site/markdown/abfs.md
Also highlight that this will have perf impact due to client and server md5
recomputations.
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/exceptions/InvalidChecksumException.java:
##########
@@ -0,0 +1,44 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+
+package org.apache.hadoop.fs.azurebfs.contracts.exceptions;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.fs.azurebfs.contracts.services.AzureServiceErrorCode;
+
+/**
+ * Exception to wrap invalid checksum verification on client side.
+ */
[email protected]
[email protected]
+public class InvalidChecksumException extends AbfsRestOperationException {
Review Comment:
Rename to AbfsInvalidChecksumException
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java:
##########
@@ -761,6 +764,11 @@ public AbfsRestOperation append(final String path, final
byte[] buffer,
requestHeaders.add(new AbfsHttpHeader(USER_AGENT, userAgentRetry));
}
+ // Add MD5 Hash of request content as request header if feature is enabled
+ if (isChecksumValidationEnabled()) {
Review Comment:
In case of appends, as per REST API doc, server will fail: "If the two
hashes do not match, the operation will fail with error code 400 (Bad Request)."
Are there indications in server error code response header to determine its
due to MD5 mismatch and can get converted to AbfsInvalidChecksumException too ?
> ABFS: Adding Support for MD5 Hash based integrity verification of the request
> content during transport
> -------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-18910
> URL: https://issues.apache.org/jira/browse/HADOOP-18910
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/azure
> Reporter: Anuj Modi
> Assignee: Anuj Modi
> Priority: Major
> Labels: pull-request-available
>
> Azure Storage Supports Content-MD5 Request Headers in Both Read and Append
> APIs.
> Read: [Path - Read - REST API (Azure Storage Services) | Microsoft
> Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read]
> Append: [Path - Update - REST API (Azure Storage Services) | Microsoft
> Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update]
> This change is to make client-side changes to support them. In Read request,
> we will send the appropriate header in response to which server will return
> the MD5 Hash of the data it sends back. On Client we will tally this with the
> MD5 hash computed from the data received.
> In Append request, we will compute the MD5 Hash of the data that we are
> sending to the server and specify that in appropriate header. Server on
> finding that header will tally this with the MD5 hash it will compute on the
> data received.
> This whole Checksum Validation Support is guarded behind a config, Config is
> by default disabled because with the use of "https" integrity of data is
> preserved anyways. This is introduced as an additional data integrity check
> which will have a performance impact as well.
> Users can decide if they want to enable this or not by setting the following
> config to *"true"* or *"false"* respectively. *Config:
> "fs.azure.enable.checksum.validation"*
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]