[ 
https://issues.apache.org/jira/browse/HADOOP-18242?focusedWorklogId=782299&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782299
 ]

ASF GitHub Bot logged work on HADOOP-18242:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 17/Jun/22 07:58
            Start Date: 17/Jun/22 07:58
    Worklog Time Spent: 10m 
      Work Description: mehakmeet commented on code in PR #4331:
URL: https://github.com/apache/hadoop/pull/4331#discussion_r899866576


##########
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemRename.java:
##########
@@ -181,17 +181,18 @@ public void testRenameWithNoDestinationParentDir() throws 
Exception {
     byte[] data = dataset(1024, 'a', 'z');
     writeDataset(fs, sourcePath, data, data.length, 1024, true);
 
-    // Check if we have retried the rename operation.
-    boolean hasRenameRetriedOnce = fs.getAbfsClient().isHasRetriedRenameOnce();
-    assertFalse("Rename shouldn't be retried before attempting to rename",
+    // Check if we have seen an incomplete state.
+    boolean hasRenameRetriedOnce = 
fs.getAbfsClient().isMetadataIncompleteState();
+    assertFalse("No incomplete state should be seen before attempting to "
+            + "rename",
         hasRenameRetriedOnce);
 
     // Verify that Renaming on a destination with no parent dir wasn't
     // successful.
     assertFalse(fs.rename(sourcePath, destPath));
 
-    // Verify that Rename operation was retried once after not succeeding.
-    hasRenameRetriedOnce = fs.getAbfsClient().isHasRetriedRenameOnce();
+    // Verify that metadata was in an incomplete state after the rename 
failure.
+    hasRenameRetriedOnce = fs.getAbfsClient().isMetadataIncompleteState();

Review Comment:
   The IOStats won't show that rename failure happened due to parent not found 
failure, since in this test we are not actually retrying successfully, as that 
is hard to simulate in an integration test, so we would get the exception 
thrown back after retrying once. Do you think I should change this test to 
verify something differently?
   In cases of actual failures we don't send back the abfsClientResult, so it 
is kind of hard to test this in ITs 🤔 





Issue Time Tracking
-------------------

    Worklog Id:     (was: 782299)
    Time Spent: 1h 50m  (was: 1h 40m)

> ABFS Rename Failure when tracking metadata is in incomplete state
> -----------------------------------------------------------------
>
>                 Key: HADOOP-18242
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18242
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/azure
>            Reporter: Mehakmeet Singh
>            Assignee: Mehakmeet Singh
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> If a node in the datacenter crashes while processing an operation, 
> occasionally it can leave the Storage-internal blob tracking metadata in an 
> incomplete state.  We expect this to happen occasionally, and so all API’s 
> are designed in such a way that if this incomplete state is observed on a 
> blob, the situation is resolved before the current operation proceeds.  
> However, this incident has exposed a bug specifically with the Rename API, 
> where the incomplete state fails to resolve, leading to this incorrect 
> failure.  As a temporary mitigation, if any other operation is performed on 
> this blob – GetBlobProperties, GetBlob, GetFileProperties, SetFileProperties, 
> etc – it should resolve the incomplete state, and rename will no longer hit 
> this issue.
> StackTrace:
> {code:java}
> 2022-03-22 17:52:19,789 DEBUG [regionserver/euwukwlss-hg50:16020.logRoller] 
> services.AbfsClient: HttpRequest: 
> 404,RenameDestinationParentPathNotFound,cid=ef5cbf0f-5d4a-4630-8a59-3d559077fc24,rid=35fef164-101f-000b-1b15-3ed818000000,sent=0,recv=212,PUT,https://euwqdaotdfdls03.dfs.core.windows.net/eykbssc/apps/hbase/data/oldWALs/euwukwlss-hg50.tdf.qa%252C16020%252C1647949929877.1647967939315?timeout=90
>    {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to