Chris Nauroth created HDFS-7714:
-----------------------------------
Summary: Simultaneous restart of HA NameNodes and DataNode can
cause DataNode to register successfully with only one NameNode.
Key: HDFS-7714
URL: https://issues.apache.org/jira/browse/HDFS-7714
Project: Hadoop HDFS
Issue Type: Bug
Components: datanode
Affects Versions: 2.6.0
Reporter: Chris Nauroth
In an HA deployment, DataNodes must register with both NameNodes and send
periodic heartbeats and block reports to both. However, if NameNodes and
DataNodes are restarted simultaneously, then this can trigger a race condition
in registration. The end result is that the {{BPServiceActor}} for one
NameNode terminates, but the {{BPServiceActor}} for the other NameNode remains
alive. The DataNode process is then in a "half-alive" state where it only
heartbeats and sends block reports to one of the NameNodes. This could cause a
loss of storage capacity after an HA failover. The DataNode process would have
to be restarted to resolve this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)