[ 
https://issues.apache.org/jira/browse/HDFS-9500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170009#comment-15170009
 ] 

Ravi Prakash commented on HDFS-9500:
------------------------------------

I see HDFS-9371 did away with the finer grained locking we had implemented in 
{{incrementVersionCount}} and {{decrementVersionCount}} (earlier we were 
synchronizing on datanodeMap, and now we synchronize on the entire 
DatanodeManager). 

bq. I think this issue is caused by DatanodeManager.registerDatanode. If nodeS 
in old version is not alive because of shutting down, it will not pass 
shouldCountVersion, so the number of old version won't be decreased. But this 
method only judges the status of heartbeat and isAlive at that moment, if 
namenode has not removed this node which will decrease the version map and this 
node restarts in the new version, the decrementVersionCount belongs to this 
node will never be executed.

Thanks for the analysis [~yangzhe1991]! Could you please help me understand it? 
Which version of Hadoop did you experience this on? How do you update the 
version of the DNs? Do you let a long time pass between bringing down the DN in 
the old version and then bringing back a DN with the new version?
What state is the Datanode in when its old version is not decremented?

Wouldn't 
https://github.com/apache/hadoop/blob/branch-2.6.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L520
 decrement the version count? 

[~kihwal] Are you seeing this too?

> datanodesSoftwareVersions map may counting wrong when rolling upgrade
> ---------------------------------------------------------------------
>
>                 Key: HDFS-9500
>                 URL: https://issues.apache.org/jira/browse/HDFS-9500
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.1, 2.6.2
>            Reporter: Phil Yang
>            Assignee: Phil Yang
>         Attachments: 9500-v1.patch
>
>
> While rolling upgrading, namenode's website overview will report there are 
> two versions datanodes in the cluster, for example, 2.6.0 has x nodes and 
> 2.6.2 has y nodes. However, sometimes when I stop a datanode in old version 
> and start a new version one, namenode only increases the number of new 
> version but not decreases the number of old version. So the total number x+y 
> will be larger than the number of datanodes. Even all datanodes are upgraded, 
> there will still have the messages that there are several datanode in old 
> version. And I must run hdfs dfsadmin -refreshNodes to clear this message.
> I think this issue is caused by DatanodeManager.registerDatanode. If nodeS in 
> old version is not alive because of shutting down, it will not pass 
> shouldCountVersion, so the number of old version won't be decreased. But this 
> method only judges the status of heartbeat and isAlive at that moment, if 
> namenode has not removed this node which will decrease the version map and 
> this node restarts in the new version, the decrementVersionCount belongs to 
> this node will never be executed.
> So the simplest way to fix this is that we always recounting the version map 
> in registerDatanode since it is not a heavy operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to