[ 
https://issues.apache.org/jira/browse/HDFS-16898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683702#comment-17683702
 ] 

ASF GitHub Bot commented on HDFS-16898:
---------------------------------------

hfutatzhanghb commented on code in PR #5330:
URL: https://github.com/apache/hadoop/pull/5330#discussion_r1095415826


##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java:
##########
@@ -679,16 +679,19 @@ boolean processCommandFromActor(DatanodeCommand cmd,
       actor.reRegister();
       return false;
     }
+    boolean isActiveActor;
     writeLock();
     try {
-      if (actor == bpServiceToActive) {
-        return processCommandFromActive(cmd, actor);
-      } else {
-        return processCommandFromStandby(cmd, actor);
-      }
+      isActiveActor = actor == bpServiceToActive;
     } finally {
       writeUnlock();
     }
+
+    if (actor == bpServiceToActive) {
+      return processCommandFromActive(cmd, actor);
+    } else {
+      return processCommandFromStandby(cmd, actor);
+    }

Review Comment:
   Hi, @virajjasani . thanks for your careful review.  Surely, before 
[HDFS-6788](https://issues.apache.org/jira/browse/HDFS-6788), this part was 
covered by synchronized lock.
   but in method `processCommandFromActive` and `processCommandFromStandby`,  
it just use the parameter actor to print log info. The lock here is just trying 
to decide actor is whether bpServiceToActive or not and determine to execute 
either processCommandFromActive or processCommandFromStandby.  
   
   when occurs switchover between active namenode and standby namenode,  the 
datanodes would be set to stale status, in stale status, we are not allowed to 
delete blocks directly, we put those blocks into postponedMisreplicatedBlocks.  
So, even we execute the DatanodeCommand from the previous active namenode(now 
standby), it is okay.





> Make write lock fine-grain in processCommandFromActor method
> ------------------------------------------------------------
>
>                 Key: HDFS-16898
>                 URL: https://issues.apache.org/jira/browse/HDFS-16898
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.3.4
>            Reporter: ZhangHB
>            Priority: Major
>              Labels: pull-request-available
>
> Now in method processCommandFromActor,  we have code like below:
>  
> {code:java}
> writeLock();
> try {
>   if (actor == bpServiceToActive) {
>     return processCommandFromActive(cmd, actor);
>   } else {
>     return processCommandFromStandby(cmd, actor);
>   }
> } finally {
>   writeUnlock();
> } {code}
> if method processCommandFromActive costs much time, the write lock would not 
> release.
>  
> It maybe block the updateActorStatesFromHeartbeat method in 
> offerService,furthermore, it can cause the lastcontact of datanode very high, 
> even dead when lastcontact beyond 600s.
> {code:java}
> bpos.updateActorStatesFromHeartbeat(
>     this, resp.getNameNodeHaState());{code}
> here we can make write lock fine-grain in processCommandFromActor method to 
> address this problem
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to