[ 
https://issues.apache.org/jira/browse/HDFS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973513#comment-13973513
 ] 

Colin Patrick McCabe commented on HDFS-6180:
--------------------------------------------

Hi Haohui,

See the discussion at HDFS-5237 for some background.  Basically, there is this 
configuration called {{dfs.datanode.hostname}} which specifies a datanode's 
"registration name."  This may be different from the first hostname you get by 
doing a reverse lookup on the DataNode's IP address.

That's why DatanodeID has three fields instead of two:
{code}
public class DatanodeID implements Comparable<DatanodeID> {
  public static final DatanodeID[] EMPTY_ARRAY = {};

  private String ipAddr;     // IP address
  private String hostName;   // hostname claimed by datanode
  private String peerHostName; // hostname from the actual connection
{code}

The field named {{hostName}} is actually not the hostname, but the 
"registration name," which is what the datanode was configured to say its name 
was, via {{dfs.datanode.hostname}}.  {{peerHostName}} is the hostname you get 
by doing a reverse DNS lookup on {{ipAddr}}.

Part of the use for registration names is in unit tests, where creating a new 
hostname is not practical.  Another use is in dealing with multi-homing setups.

bq. The reason why I removed this test is that -registration-name- is not a 
valid DNS name.

The point of the test was to ensure that we could specify registration names in 
the exclude and include files and have them work.  We should make sure that 
this functionality is still working.

This is a real problem for some people.  For example, consider if you have an 
AWS instance with an external and internal hostname.  You might configure your 
DNs to use {{dn1.internal.host.name}} (or whatever) rather than 
{{dn1.external.host.name}}.  This avoids the issue where the NN does a reverse 
DNS lookup on the IP, and comes up with {{dn1.external.host.name}}, and starts 
sending traffic over the wrong interface.  This sort of thing is very important 
on AWS, because people are actually charged money for sending traffic to the 
external hostname (rather than internal).

If you like, the test could be configured to use a valid but non-default 
loopback IP (such as 127.0.5.1) rather than an invalid string.  But in any 
case, I think we need a JIRA to restore it.  Will file one shortly.

> dead node count / listing is very broken in JMX and old GUI
> -----------------------------------------------------------
>
>                 Key: HDFS-6180
>                 URL: https://issues.apache.org/jira/browse/HDFS-6180
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Travis Thompson
>            Assignee: Haohui Mai
>            Priority: Blocker
>             Fix For: 2.5.0
>
>         Attachments: HDFS-6180.000.patch, HDFS-6180.001.patch, 
> HDFS-6180.002.patch, HDFS-6180.003.patch, HDFS-6180.004.patch, dn.log
>
>
> After bringing up a 578 node cluster with 13 dead nodes, 0 were reported on 
> the new GUI, but showed up properly in the datanodes tab.  Some nodes are 
> also being double reported in the deadnode and inservice section (22 show up 
> dead, 565 show up alive, 9 duplicated nodes). 
> From /jmx (confirmed that it's the same in jconsole):
> {noformat}
> {
>     "name" : "Hadoop:service=NameNode,name=FSNamesystemState",
>     "modelerType" : "org.apache.hadoop.hdfs.server.namenode.FSNamesystem",
>     "CapacityTotal" : 5477748687372288,
>     "CapacityUsed" : 24825720407,
>     "CapacityRemaining" : 5477723861651881,
>     "TotalLoad" : 565,
>     "SnapshotStats" : "{\"SnapshottableDirectories\":0,\"Snapshots\":0}",
>     "BlocksTotal" : 21065,
>     "MaxObjects" : 0,
>     "FilesTotal" : 25454,
>     "PendingReplicationBlocks" : 0,
>     "UnderReplicatedBlocks" : 0,
>     "ScheduledReplicationBlocks" : 0,
>     "FSState" : "Operational",
>     "NumLiveDataNodes" : 565,
>     "NumDeadDataNodes" : 0,
>     "NumDecomLiveDataNodes" : 0,
>     "NumDecomDeadDataNodes" : 0,
>     "NumDecommissioningDataNodes" : 0,
>     "NumStaleDataNodes" : 1
>   },
> {noformat}
> I'm not going to include deadnode/livenodes because the list is huge, but 
> I've confirmed there are 9 nodes showing up in both deadnodes and livenodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to