[jira] [Commented] (HDFS-3224) Bug in check for DN re-registration with different storage ID

Jason Lowe (JIRA) Fri, 05 Oct 2012 14:28:06 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470677#comment-13470677
 ]


Jason Lowe commented on HDFS-3224:
----------------------------------

This bug seems benign but is causing issues with ops monitoring scripts because 
it allows a node to be reported as simultaneously live and dead by the NN web 
UI and JMX.  Here's one scenario:

* Node is registered and appears as a live node
* Node fails badly, starts showing up as a dead node
* Node is re-imaged by ops as a fresh node
* Node rejoins the cluster, and now the same host is reported as both live and 
dead

Since re-imaging the node causes it to get a new storage ID, the failure to 
recognized it by name means the NN thinks it's a totally different node and 
therefore the node is placed in the datanode map twice for the two storage IDs.

In this case I think we should be calling getDatanodeByName (i.e.: where we 
include the port).  This would help us properly distinguish datanodes that are 
using ephemeral ports (e.g.: miniclusters).
                
> Bug in check for DN re-registration with different storage ID
> -------------------------------------------------------------
>
>                 Key: HDFS-3224
>                 URL: https://issues.apache.org/jira/browse/HDFS-3224
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Eli Collins
>            Priority: Minor
>
> DatanodeManager#registerDatanode checks the host to node map using an IP:port 
> key, however the map is keyed on IP, so this check will always fail. It's 
> performing the check to determine if a DN with the same IP and storage ID has 
> already registered, and if so to remove this DN from the map and indicate 
> that eg it's no longer hosting these blocks. This bug has been here forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3224) Bug in check for DN re-registration with different storage ID

Reply via email to