Ming Ma created HDFS-8056:
-----------------------------
Summary: Decommissioned dead nodes should continue to be counted
as dead after NN restart
Key: HDFS-8056
URL: https://issues.apache.org/jira/browse/HDFS-8056
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Ming Ma
We had some offline discussion with [~andrew.wang] and [~cmccabe] about this.
Bring this up for more input and get the patch in place.
Dead nodes are tracked by {{DatanodeManager}}'s {{datanodeMap}}. However, after
NN restarts, those nodes that were dead before NN restart won't be in
{{datanodeMap}}. {{DatanodeManager}}'s {{getDatanodeListForReport}} will add
those dead nodes, but not if they are in the exclude file.
{noformat}
if (listDeadNodes) {
for (InetSocketAddress addr : includedNodes) {
if (foundNodes.matchedBy(addr) || excludedNodes.match(addr)) {
continue;
}
// The remaining nodes are ones that are referenced by the hosts
// files but that we do not know about, ie that we have never
// head from. Eg. an entry that is no longer part of the cluster
// or a bogus entry was given in the hosts files
//
// If the host file entry specified the xferPort, we use that.
// Otherwise, we guess that it is the default xfer port.
// We can't ask the DataNode what it had configured, because it's
// dead.
DatanodeDescriptor dn = new DatanodeDescriptor(new DatanodeID(addr
.getAddress().getHostAddress(), addr.getHostName(), "",
addr.getPort() == 0 ? defaultXferPort : addr.getPort(),
defaultInfoPort, defaultInfoSecurePort, defaultIpcPort));
setDatanodeDead(dn);
nodes.add(dn);
}
}
{noformat}
The issue here is the decommissioned dead node JMX will be different after NN
restart. It might be better to make it consistent across NN restart.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)