[ 
https://issues.apache.org/jira/browse/HDFS-9500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603695#comment-15603695
 ] 

Erik Krogen commented on HDFS-9500:
-----------------------------------

Assigning this to myself since Phil does not seem to be actively working on it 
anymore.

I can (intermittently) reproduce this test failure on branch-2.7 if I increase 
the number of iterations on 
{{TestDatanodeManager.testNumVersionsReportedCorrect}} to 5000. I found that 
for the node whose version should have been decremented, 
{{shouldDecrementVersion()}} returned false because {{isDatanodeDead()}} was 
true (but {{isAlive}} was also true). 

It seems this situation could arise if the time since the last heartbeat from 
the node was above the threshold to determine it is as dead, but the 
{{HeartbeatManager}} had not yet done so. I am open to suggestions about this. 
Would just checking {{DatanodeDescriptor.isAlive}} be sufficient here instead 
of the check on both {{isAlive}} and {{isDatanodeDead()}}? 

> datanodesSoftwareVersions map may counting wrong when rolling upgrade
> ---------------------------------------------------------------------
>
>                 Key: HDFS-9500
>                 URL: https://issues.apache.org/jira/browse/HDFS-9500
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.1, 2.6.2
>            Reporter: Phil Yang
>            Assignee: Erik Krogen
>         Attachments: 9500-v1.patch
>
>
> While rolling upgrading, namenode's website overview will report there are 
> two versions datanodes in the cluster, for example, 2.6.0 has x nodes and 
> 2.6.2 has y nodes. However, sometimes when I stop a datanode in old version 
> and start a new version one, namenode only increases the number of new 
> version but not decreases the number of old version. So the total number x+y 
> will be larger than the number of datanodes. Even all datanodes are upgraded, 
> there will still have the messages that there are several datanode in old 
> version. And I must run hdfs dfsadmin -refreshNodes to clear this message.
> I think this issue is caused by DatanodeManager.registerDatanode. If nodeS in 
> old version is not alive because of shutting down, it will not pass 
> shouldCountVersion, so the number of old version won't be decreased. But this 
> method only judges the status of heartbeat and isAlive at that moment, if 
> namenode has not removed this node which will decrease the version map and 
> this node restarts in the new version, the decrementVersionCount belongs to 
> this node will never be executed.
> So the simplest way to fix this is that we always recounting the version map 
> in registerDatanode since it is not a heavy operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to