[jira] [Commented] (HDFS-17150) EC: Fix the bug of failed lease recovery.

ASF GitHub Bot (Jira) Fri, 11 Aug 2023 05:01:06 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753209#comment-17753209
 ]


ASF GitHub Bot commented on HDFS-17150:
---------------------------------------

Hexiaoqiao commented on code in PR #5937:
URL: https://github.com/apache/hadoop/pull/5937#discussion_r1291256537


##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java:
##########
@@ -3802,16 +3803,26 @@ boolean internalReleaseLease(Lease lease, String src, 
INodesInPath iip,
             lastBlock.getBlockType());
       }
 
-      if (uc.getNumExpectedLocations() == 0 && lastBlock.getNumBytes() == 0) {
+      int minLocationsNum = 1;
+      if (lastBlock.isStriped()) {
+        minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum();
+      }
+      if (uc.getNumExpectedLocations() < minLocationsNum &&
+          lastBlock.getNumBytes() == 0) {
         // There is no datanode reported to this block.
         // may be client have crashed before writing data to pipeline.
         // This blocks doesn't need any recovery.
         // We can remove this block and close the file.
         pendingFile.removeLastBlock(lastBlock);
         finalizeINodeFileUnderConstruction(src, pendingFile,
             iip.getLatestSnapshotId(), false);
-        NameNode.stateChangeLog.warn("BLOCK* internalReleaseLease: "
-            + "Removed empty last block and closed file " + src);
+        if (uc.getNumExpectedLocations() == 0) {
+          NameNode.stateChangeLog.warn("BLOCK* internalReleaseLease: "
+              + "Removed empty last block and closed file " + src);
+        } else {
+          NameNode.stateChangeLog.warn("BLOCK* internalReleaseLease: "

Review Comment:
   A little weird that if we can determine it is EC file here when 
`uc.getNumExpectedLocations() != 0`, if not it will be ambiguity of this log 
print.





> EC: Fix the bug of failed lease recovery.
> -----------------------------------------
>
>                 Key: HDFS-17150
>                 URL: https://issues.apache.org/jira/browse/HDFS-17150
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Shuyan Zhang
>            Priority: Major
>              Labels: pull-request-available
>
> If the client crashes without writing the minimum number of internal blocks 
> required by the EC policy, the lease recovery process for the corresponding 
> unclosed file may continue to fail. Taking RS(6,3) policy as an example, the 
> timeline is as follows:
> 1. The client writes some data to only 5 datanodes;
> 2. Client crashes;
> 3. NN fails over;
> 4. Now the result of `uc.getNumExpectedLocations()` completely depends on 
> block report, and there are 5 datanodes reporting internal blocks;
> 5. When the lease expires hard limit, NN issues a block recovery command;
> 6. The datanode checks the command and finds that the number of internal 
> blocks is insufficient, resulting in an error and recovery failure;
> 7. The lease expires hard limit again, and NN issues a block recovery command 
> again, but the recovery fails again......
> When the number of internal blocks written by the client is less than 6, the 
> block group is actually unrecoverable. We should equate this situation to the 
> case where the number of replicas is 0 when processing replica files, i.e., 
> directly remove the last block group and close the file.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17150) EC: Fix the bug of failed lease recovery.

Reply via email to