[ https://issues.apache.org/jira/browse/HADOOP-19604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012587#comment-18012587 ]
ASF GitHub Bot commented on HADOOP-19604: ----------------------------------------- Copilot commented on code in PR #7853: URL: https://github.com/apache/hadoop/pull/7853#discussion_r2260273456 ########## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java: ########## @@ -1103,8 +1107,21 @@ public AbfsRestOperation flush(byte[] buffer, AbfsRestOperation op1 = getPathStatus(path, true, tracingContext, contextEncryptionAdapter); String metadataMd5 = op1.getResult().getResponseHeader(CONTENT_MD5); - if (blobMd5 != null && !blobMd5.equals(metadataMd5)) { - throw ex; + /* + * Validate the response by comparing the server's MD5 metadata against either: + * 1. The full blob content MD5 (if full blob checksum validation is enabled), or + * 2. The full block ID buffer MD5 (fallback if blob checksum validation is disabled) + */ + if (getAbfsConfiguration().isFullBlobChecksumValidationEnabled() && blobMd5 != null) { Review Comment: The configuration check `getAbfsConfiguration().isFullBlobChecksumValidationEnabled()` is performed inside the response validation logic, but this same check was already done earlier in the method. Consider storing the result in a variable to avoid redundant method calls. ```suggestion if (fullBlobChecksumValidationEnabled && blobMd5 != null) { ``` ########## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java: ########## @@ -1073,11 +1073,15 @@ public AbfsRestOperation flush(byte[] buffer, requestHeaders.add(new AbfsHttpHeader(CONTENT_LENGTH, String.valueOf(buffer.length))); requestHeaders.add(new AbfsHttpHeader(CONTENT_TYPE, APPLICATION_XML)); requestHeaders.add(new AbfsHttpHeader(IF_MATCH, eTag)); + String md5Hash = null; if (leaseId != null) { requestHeaders.add(new AbfsHttpHeader(X_MS_LEASE_ID, leaseId)); } - if (blobMd5 != null) { + if (isFullBlobChecksumValidationEnabled() && blobMd5 != null) { requestHeaders.add(new AbfsHttpHeader(X_MS_BLOB_CONTENT_MD5, blobMd5)); + } else { + md5Hash = computeMD5Hash(buffer, 0, buffer.length); + requestHeaders.add(new AbfsHttpHeader(X_MS_BLOB_CONTENT_MD5, md5Hash)); Review Comment: The variable `md5Hash` is declared but only used in the else branch. Consider declaring it within the else block to improve code clarity and reduce the scope of the variable. ########## hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsOutputStream.java: ########## @@ -481,6 +481,8 @@ public void testResetCalledOnExceptionInRemoteFlush() throws Exception { //expected exception } // Verify that reset was called on the message digest - Mockito.verify(mockMessageDigest, Mockito.times(1)).reset(); + if (spiedClient.isChecksumValidationEnabled()) { Review Comment: The test is checking `isChecksumValidationEnabled()` but the code change suggests this should be checking `isFullBlobChecksumValidationEnabled()` since the MD5 reset behavior is now conditional on full blob checksum validation, not general checksum validation. ```suggestion if (spiedClient.isFullBlobChecksumValidationEnabled()) { ``` ########## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsConfiguration.java: ########## @@ -1705,6 +1709,10 @@ public void setIsChecksumValidationEnabled(boolean isChecksumValidationEnabled) this.isChecksumValidationEnabled = isChecksumValidationEnabled; } + public boolean isFullBlobChecksumValidationEnabled() { Review Comment: The getter method `isFullBlobChecksumValidationEnabled()` lacks a corresponding setter method, which breaks the pattern established by other configuration properties like `isChecksumValidationEnabled`. This could cause issues for programmatic configuration changes. > ABFS: Fix WASB ABFS compatibility issues > ---------------------------------------- > > Key: HADOOP-19604 > URL: https://issues.apache.org/jira/browse/HADOOP-19604 > Project: Hadoop Common > Issue Type: Sub-task > Affects Versions: 3.4.1 > Reporter: Anmol Asrani > Assignee: Anmol Asrani > Priority: Major > Labels: pull-request-available > Fix For: 3.4.1 > > > Fix WASB ABFS compatibility issues. Fix issues such as:- > # BlockId computation to be consistent across clients for PutBlock and > PutBlockList > # Restrict url encoding of certain json metadata during setXAttr calls. > # Maintain the md5 hash of whole block to validate data integrity during > flush. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org