[ 
https://issues.apache.org/jira/browse/HDFS-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-8056:
--------------------------
    Attachment: HDFS-8056.patch

Here is the initial patch. It put the dead node under (dead, decommissioned) 
after NN restart even though we don't know if the node was (dead, 
decommissioned) or (dead, decommission-in-progress) prior to NN restart. It 
shouldn't really matter. If the node was in (dead, decommission-in-progress) 
and becomes alive after NN restart, it will be put to datanodeMap and start the 
decommission process.

> Decommissioned dead nodes should continue to be counted as dead after NN 
> restart
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-8056
>                 URL: https://issues.apache.org/jira/browse/HDFS-8056
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ming Ma
>         Attachments: HDFS-8056.patch
>
>
> We had some offline discussion with [~andrew.wang] and [~cmccabe] about this. 
> Bring this up for more input and get the patch in place.
> Dead nodes are tracked by {{DatanodeManager}}'s {{datanodeMap}}. However, 
> after NN restarts, those nodes that were dead before NN restart won't be in 
> {{datanodeMap}}. {{DatanodeManager}}'s {{getDatanodeListForReport}} will add 
> those dead nodes, but not if they are in the exclude file.
> {noformat}
>     if (listDeadNodes) {
>       for (InetSocketAddress addr : includedNodes) {
>         if (foundNodes.matchedBy(addr) || excludedNodes.match(addr)) {
>           continue;
>         }
>         // The remaining nodes are ones that are referenced by the hosts
>         // files but that we do not know about, ie that we have never
>         // head from. Eg. an entry that is no longer part of the cluster
>         // or a bogus entry was given in the hosts files
>         //
>         // If the host file entry specified the xferPort, we use that.
>         // Otherwise, we guess that it is the default xfer port.
>         // We can't ask the DataNode what it had configured, because it's
>         // dead.
>         DatanodeDescriptor dn = new DatanodeDescriptor(new DatanodeID(addr
>                 .getAddress().getHostAddress(), addr.getHostName(), "",
>                 addr.getPort() == 0 ? defaultXferPort : addr.getPort(),
>                 defaultInfoPort, defaultInfoSecurePort, defaultIpcPort));
>         setDatanodeDead(dn);
>         nodes.add(dn);
>       }
>     }
> {noformat}
> The issue here is the decommissioned dead node JMX will be different after NN 
> restart. It might be better to make it consistent across NN restart. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to