[
https://issues.apache.org/jira/browse/HDFS-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518357#comment-14518357
]
Ming Ma commented on HDFS-8056:
-------------------------------
[~andrew.wang] and others, appreciate any input you might have.
> Decommissioned dead nodes should continue to be counted as dead after NN
> restart
> --------------------------------------------------------------------------------
>
> Key: HDFS-8056
> URL: https://issues.apache.org/jira/browse/HDFS-8056
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Ming Ma
> Assignee: Ming Ma
> Attachments: HDFS-8056-2.patch, HDFS-8056.patch
>
>
> We had some offline discussion with [~andrew.wang] and [~cmccabe] about this.
> Bring this up for more input and get the patch in place.
> Dead nodes are tracked by {{DatanodeManager}}'s {{datanodeMap}}. However,
> after NN restarts, those nodes that were dead before NN restart won't be in
> {{datanodeMap}}. {{DatanodeManager}}'s {{getDatanodeListForReport}} will add
> those dead nodes, but not if they are in the exclude file.
> {noformat}
> if (listDeadNodes) {
> for (InetSocketAddress addr : includedNodes) {
> if (foundNodes.matchedBy(addr) || excludedNodes.match(addr)) {
> continue;
> }
> // The remaining nodes are ones that are referenced by the hosts
> // files but that we do not know about, ie that we have never
> // head from. Eg. an entry that is no longer part of the cluster
> // or a bogus entry was given in the hosts files
> //
> // If the host file entry specified the xferPort, we use that.
> // Otherwise, we guess that it is the default xfer port.
> // We can't ask the DataNode what it had configured, because it's
> // dead.
> DatanodeDescriptor dn = new DatanodeDescriptor(new DatanodeID(addr
> .getAddress().getHostAddress(), addr.getHostName(), "",
> addr.getPort() == 0 ? defaultXferPort : addr.getPort(),
> defaultInfoPort, defaultInfoSecurePort, defaultIpcPort));
> setDatanodeDead(dn);
> nodes.add(dn);
> }
> }
> {noformat}
> The issue here is the decommissioned dead node JMX will be different after NN
> restart. It might be better to make it consistent across NN restart.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)