[ 
https://issues.apache.org/jira/browse/HDFS-16898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684395#comment-17684395
 ] 

ASF GitHub Bot commented on HDFS-16898:
---------------------------------------

zhangshuyan0 commented on PR #5330:
URL: https://github.com/apache/hadoop/pull/5330#issuecomment-1418437375

   It is great to prevent the heartbeat from being affected by command 
processing. I checked that processCommandFromXXX() doesn't access any members 
inside BPOfferService that can be changed. 
   The only thing to note is that in the original code, after the switchover, 
the new ANN can guarantee that the DN will not execute the commands from the 
old ANN as long as it receives two heartbeats from the DN. After the function 
is placed outside the lock, this guarantee no longer exists. However, as 
@hfutatzhanghb  said, NN will set the DataNode to stale after the switchover, 
which means that NN does not rely on this guarantee. So, I think this patch is 
safe.
   




> Make write lock fine-grain in processCommandFromActor method
> ------------------------------------------------------------
>
>                 Key: HDFS-16898
>                 URL: https://issues.apache.org/jira/browse/HDFS-16898
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.3.4
>            Reporter: ZhangHB
>            Priority: Major
>              Labels: pull-request-available
>
> Now in method processCommandFromActor,  we have code like below:
>  
> {code:java}
> writeLock();
> try {
>   if (actor == bpServiceToActive) {
>     return processCommandFromActive(cmd, actor);
>   } else {
>     return processCommandFromStandby(cmd, actor);
>   }
> } finally {
>   writeUnlock();
> } {code}
> if method processCommandFromActive costs much time, the write lock would not 
> release.
>  
> It maybe block the updateActorStatesFromHeartbeat method in 
> offerService,furthermore, it can cause the lastcontact of datanode very high, 
> even dead when lastcontact beyond 600s.
> {code:java}
> bpos.updateActorStatesFromHeartbeat(
>     this, resp.getNameNodeHaState());{code}
> here we can make write lock fine-grain in processCommandFromActor method to 
> address this problem
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to