[ 
https://issues.apache.org/jira/browse/HDFS-16622?focusedWorklogId=779952&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779952
 ]

ASF GitHub Bot logged work on HDFS-16622:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Jun/22 13:35
            Start Date: 09/Jun/22 13:35
    Worklog Time Spent: 10m 
      Work Description: Hexiaoqiao commented on code in PR #4407:
URL: https://github.com/apache/hadoop/pull/4407#discussion_r893514408


##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/IncrementalBlockReportManager.java:
##########
@@ -251,12 +251,20 @@ synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi,
       DatanodeStorage storage) {
     // Make sure another entry for the same block is first removed.
     // There may only be one such entry.
+    ReceivedDeletedBlockInfo removedInfo = null;
     for (PerStorageIBR perStorage : pendingIBRs.values()) {
-      if (perStorage.remove(rdbi.getBlock()) != null) {
+      removedInfo = perStorage.remove(rdbi.getBlock());
+      if (removedInfo != null) {
         break;
       }
     }
-    getPerStorageIBR(storage).put(rdbi);
+    if (removedInfo != null &&

Review Comment:
   @ZanderXu Thanks for the detailed information. It is an interesting case. 
IMO, this improvement makes sense to me. Would you mind to add unit test to 
cover this case?





Issue Time Tracking
-------------------

    Worklog Id:     (was: 779952)
    Time Spent: 1h  (was: 50m)

> addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-16622
>                 URL: https://issues.apache.org/jira/browse/HDFS-16622
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: ZanderXu
>            Assignee: ZanderXu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> In our production environment,  there is a strange missing block, according 
> to the log, I suspect there is a bug in function 
> addRDBI(ReceivedDeletedBlockInfo rdbi,DatanodeStorage storage)(line 250).
> Bug code in the for loop:
> {code:java}
> synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi,
>       DatanodeStorage storage) {
>     // Make sure another entry for the same block is first removed.
>     // There may only be one such entry.
>     for (PerStorageIBR perStorage : pendingIBRs.values()) {
>       if (perStorage.remove(rdbi.getBlock()) != null) {
>         break;
>       }
>     }
>     getPerStorageIBR(storage).put(rdbi);
>   }
> {code}
> Removed the GS of the Block in ReceivedDeletedBlockInfo may be greater than 
> the GS of the Block in rdbi. And NN will invalidate the Replicate will small 
> GS when complete one block. 
> So If there is only one replicate for one block, there is a possibility of 
> missingblock because of this wrong logic. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to