[
https://issues.apache.org/jira/browse/HDFS-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470677#comment-13470677
]
Jason Lowe commented on HDFS-3224:
----------------------------------
This bug seems benign but is causing issues with ops monitoring scripts because
it allows a node to be reported as simultaneously live and dead by the NN web
UI and JMX. Here's one scenario:
* Node is registered and appears as a live node
* Node fails badly, starts showing up as a dead node
* Node is re-imaged by ops as a fresh node
* Node rejoins the cluster, and now the same host is reported as both live and
dead
Since re-imaging the node causes it to get a new storage ID, the failure to
recognized it by name means the NN thinks it's a totally different node and
therefore the node is placed in the datanode map twice for the two storage IDs.
In this case I think we should be calling getDatanodeByName (i.e.: where we
include the port). This would help us properly distinguish datanodes that are
using ephemeral ports (e.g.: miniclusters).
> Bug in check for DN re-registration with different storage ID
> -------------------------------------------------------------
>
> Key: HDFS-3224
> URL: https://issues.apache.org/jira/browse/HDFS-3224
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Eli Collins
> Priority: Minor
>
> DatanodeManager#registerDatanode checks the host to node map using an IP:port
> key, however the map is keyed on IP, so this check will always fail. It's
> performing the check to determine if a DN with the same IP and storage ID has
> already registered, and if so to remove this DN from the map and indicate
> that eg it's no longer hosting these blocks. This bug has been here forever.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira