[ 
https://issues.apache.org/jira/browse/HADOOP-19604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012587#comment-18012587
 ] 

ASF GitHub Bot commented on HADOOP-19604:
-----------------------------------------

Copilot commented on code in PR #7853:
URL: https://github.com/apache/hadoop/pull/7853#discussion_r2260273456


##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java:
##########
@@ -1103,8 +1107,21 @@ public AbfsRestOperation flush(byte[] buffer,
         AbfsRestOperation op1 = getPathStatus(path, true, tracingContext,
             contextEncryptionAdapter);
         String metadataMd5 = op1.getResult().getResponseHeader(CONTENT_MD5);
-        if (blobMd5 != null && !blobMd5.equals(metadataMd5)) {
-          throw ex;
+        /*
+         * Validate the response by comparing the server's MD5 metadata 
against either:
+         * 1. The full blob content MD5 (if full blob checksum validation is 
enabled), or
+         * 2. The full block ID buffer MD5 (fallback if blob checksum 
validation is disabled)
+         */
+        if (getAbfsConfiguration().isFullBlobChecksumValidationEnabled() && 
blobMd5 != null) {

Review Comment:
   The configuration check 
`getAbfsConfiguration().isFullBlobChecksumValidationEnabled()` is performed 
inside the response validation logic, but this same check was already done 
earlier in the method. Consider storing the result in a variable to avoid 
redundant method calls.
   ```suggestion
           if (fullBlobChecksumValidationEnabled && blobMd5 != null) {
   ```



##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java:
##########
@@ -1073,11 +1073,15 @@ public AbfsRestOperation flush(byte[] buffer,
     requestHeaders.add(new AbfsHttpHeader(CONTENT_LENGTH, 
String.valueOf(buffer.length)));
     requestHeaders.add(new AbfsHttpHeader(CONTENT_TYPE, APPLICATION_XML));
     requestHeaders.add(new AbfsHttpHeader(IF_MATCH, eTag));
+    String md5Hash = null;
     if (leaseId != null) {
       requestHeaders.add(new AbfsHttpHeader(X_MS_LEASE_ID, leaseId));
     }
-    if (blobMd5 != null) {
+    if (isFullBlobChecksumValidationEnabled() && blobMd5 != null) {
       requestHeaders.add(new AbfsHttpHeader(X_MS_BLOB_CONTENT_MD5, blobMd5));
+    } else {
+      md5Hash = computeMD5Hash(buffer, 0, buffer.length);
+      requestHeaders.add(new AbfsHttpHeader(X_MS_BLOB_CONTENT_MD5, md5Hash));

Review Comment:
   The variable `md5Hash` is declared but only used in the else branch. 
Consider declaring it within the else block to improve code clarity and reduce 
the scope of the variable.



##########
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsOutputStream.java:
##########
@@ -481,6 +481,8 @@ public void testResetCalledOnExceptionInRemoteFlush() 
throws Exception {
       //expected exception
     }
     // Verify that reset was called on the message digest
-    Mockito.verify(mockMessageDigest, Mockito.times(1)).reset();
+    if (spiedClient.isChecksumValidationEnabled()) {

Review Comment:
   The test is checking `isChecksumValidationEnabled()` but the code change 
suggests this should be checking `isFullBlobChecksumValidationEnabled()` since 
the MD5 reset behavior is now conditional on full blob checksum validation, not 
general checksum validation.
   ```suggestion
       if (spiedClient.isFullBlobChecksumValidationEnabled()) {
   ```



##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsConfiguration.java:
##########
@@ -1705,6 +1709,10 @@ public void setIsChecksumValidationEnabled(boolean 
isChecksumValidationEnabled)
     this.isChecksumValidationEnabled = isChecksumValidationEnabled;
   }
 
+  public boolean isFullBlobChecksumValidationEnabled() {

Review Comment:
   The getter method `isFullBlobChecksumValidationEnabled()` lacks a 
corresponding setter method, which breaks the pattern established by other 
configuration properties like `isChecksumValidationEnabled`. This could cause 
issues for programmatic configuration changes.





> ABFS: Fix WASB ABFS compatibility issues
> ----------------------------------------
>
>                 Key: HADOOP-19604
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19604
>             Project: Hadoop Common
>          Issue Type: Sub-task
>    Affects Versions: 3.4.1
>            Reporter: Anmol Asrani
>            Assignee: Anmol Asrani
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.1
>
>
> Fix WASB ABFS compatibility issues. Fix issues such as:-
>  # BlockId computation to be consistent across clients for PutBlock and 
> PutBlockList
>  # Restrict url encoding of certain json metadata during setXAttr calls.
>  # Maintain the md5 hash of whole block to validate data integrity during 
> flush.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to