[
https://issues.apache.org/jira/browse/HDFS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299338#comment-14299338
]
Kihwal Lee commented on HDFS-7714:
----------------------------------
On a related note, I've seen similar symproms when the two namenodes' ctimes in
their storage are different. After a datanode registers with one nn, it won't
be able to register with the other and cause the actor thread to die. Depending
on whom each datanode talk to first, they will be divided into two sets, each
of which talking to only one namenode, thus creating a split brain situation.
Of course, running two namenodes with different storage version is a mistake,
but I've seen people making this kind of mistake multiple times. Whenever it
happened, I wished for a way to start the actor thread back up. The
refreshNamenodes dfs admin command does not work for HA configuration.
> Simultaneous restart of HA NameNodes and DataNode can cause DataNode to
> register successfully with only one NameNode.
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-7714
> URL: https://issues.apache.org/jira/browse/HDFS-7714
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.6.0
> Reporter: Chris Nauroth
>
> In an HA deployment, DataNodes must register with both NameNodes and send
> periodic heartbeats and block reports to both. However, if NameNodes and
> DataNodes are restarted simultaneously, then this can trigger a race
> condition in registration. The end result is that the {{BPServiceActor}} for
> one NameNode terminates, but the {{BPServiceActor}} for the other NameNode
> remains alive. The DataNode process is then in a "half-alive" state where it
> only heartbeats and sends block reports to one of the NameNodes. This could
> cause a loss of storage capacity after an HA failover. The DataNode process
> would have to be restarted to resolve this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)