[
https://issues.apache.org/jira/browse/HDFS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973513#comment-13973513
]
Colin Patrick McCabe commented on HDFS-6180:
--------------------------------------------
Hi Haohui,
See the discussion at HDFS-5237 for some background. Basically, there is this
configuration called {{dfs.datanode.hostname}} which specifies a datanode's
"registration name." This may be different from the first hostname you get by
doing a reverse lookup on the DataNode's IP address.
That's why DatanodeID has three fields instead of two:
{code}
public class DatanodeID implements Comparable<DatanodeID> {
public static final DatanodeID[] EMPTY_ARRAY = {};
private String ipAddr; // IP address
private String hostName; // hostname claimed by datanode
private String peerHostName; // hostname from the actual connection
{code}
The field named {{hostName}} is actually not the hostname, but the
"registration name," which is what the datanode was configured to say its name
was, via {{dfs.datanode.hostname}}. {{peerHostName}} is the hostname you get
by doing a reverse DNS lookup on {{ipAddr}}.
Part of the use for registration names is in unit tests, where creating a new
hostname is not practical. Another use is in dealing with multi-homing setups.
bq. The reason why I removed this test is that -registration-name- is not a
valid DNS name.
The point of the test was to ensure that we could specify registration names in
the exclude and include files and have them work. We should make sure that
this functionality is still working.
This is a real problem for some people. For example, consider if you have an
AWS instance with an external and internal hostname. You might configure your
DNs to use {{dn1.internal.host.name}} (or whatever) rather than
{{dn1.external.host.name}}. This avoids the issue where the NN does a reverse
DNS lookup on the IP, and comes up with {{dn1.external.host.name}}, and starts
sending traffic over the wrong interface. This sort of thing is very important
on AWS, because people are actually charged money for sending traffic to the
external hostname (rather than internal).
If you like, the test could be configured to use a valid but non-default
loopback IP (such as 127.0.5.1) rather than an invalid string. But in any
case, I think we need a JIRA to restore it. Will file one shortly.
> dead node count / listing is very broken in JMX and old GUI
> -----------------------------------------------------------
>
> Key: HDFS-6180
> URL: https://issues.apache.org/jira/browse/HDFS-6180
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.3.0
> Reporter: Travis Thompson
> Assignee: Haohui Mai
> Priority: Blocker
> Fix For: 2.5.0
>
> Attachments: HDFS-6180.000.patch, HDFS-6180.001.patch,
> HDFS-6180.002.patch, HDFS-6180.003.patch, HDFS-6180.004.patch, dn.log
>
>
> After bringing up a 578 node cluster with 13 dead nodes, 0 were reported on
> the new GUI, but showed up properly in the datanodes tab. Some nodes are
> also being double reported in the deadnode and inservice section (22 show up
> dead, 565 show up alive, 9 duplicated nodes).
> From /jmx (confirmed that it's the same in jconsole):
> {noformat}
> {
> "name" : "Hadoop:service=NameNode,name=FSNamesystemState",
> "modelerType" : "org.apache.hadoop.hdfs.server.namenode.FSNamesystem",
> "CapacityTotal" : 5477748687372288,
> "CapacityUsed" : 24825720407,
> "CapacityRemaining" : 5477723861651881,
> "TotalLoad" : 565,
> "SnapshotStats" : "{\"SnapshottableDirectories\":0,\"Snapshots\":0}",
> "BlocksTotal" : 21065,
> "MaxObjects" : 0,
> "FilesTotal" : 25454,
> "PendingReplicationBlocks" : 0,
> "UnderReplicatedBlocks" : 0,
> "ScheduledReplicationBlocks" : 0,
> "FSState" : "Operational",
> "NumLiveDataNodes" : 565,
> "NumDeadDataNodes" : 0,
> "NumDecomLiveDataNodes" : 0,
> "NumDecomDeadDataNodes" : 0,
> "NumDecommissioningDataNodes" : 0,
> "NumStaleDataNodes" : 1
> },
> {noformat}
> I'm not going to include deadnode/livenodes because the list is huge, but
> I've confirmed there are 9 nodes showing up in both deadnodes and livenodes.
--
This message was sent by Atlassian JIRA
(v6.2#6252)