[jira] [Commented] (HDFS-6180) dead node count / listing is very broken in JMX and old GUI

Haohui Mai (JIRA) Thu, 03 Apr 2014 12:10:07 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959097#comment-13959097
 ]


Haohui Mai commented on HDFS-6180:
----------------------------------

The duplicated entry is added by the following code in the {{DatanodeManager}}

{code}
    if (listDeadNodes) {
      final EntrySet includedNodes = hostFileManager.getIncludes();
      final EntrySet excludedNodes = hostFileManager.getExcludes();
      for (Entry entry : includedNodes) {
        if ((foundNodes.find(entry) == null) &&
            (excludedNodes.find(entry) == null)) {
{code}

Note that {{entry}} does not contain the port as it comes from the include 
file, but all entries in {{foundNode}} do. If passed in an entry without port, 
the {{find}} function should be able to match it with the one with port 
information.

Internally {{find}} is implemented in {{TreeMap}}, which uses {{ip}} or 
{{ip:port}} as the key. Since in lexically order the entry with port comes 
after the one without port, it implements the port matching rule by checking 
whether the next entry has the same id. The problem is that this heuristic is 
unreliable. It returns buggy results for the below examples:

{noformat}
172.18.146.3:1019
172.18.146.30:1019
{noformat}

Calling {{find(172.18.146.3)}} checks {{172.18.146.30:1019}} instead of 
{{172.18.146.3:1019}}, resulting the bug.

The bug can be quite confusing from the end user's prospective and I'd like to 
move forward as quickly as possible.

[~kamrul], are you working on it? If not I can work on a patch later today.

> dead node count / listing is very broken in JMX and old GUI
> -----------------------------------------------------------
>
>                 Key: HDFS-6180
>                 URL: https://issues.apache.org/jira/browse/HDFS-6180
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Travis Thompson
>            Assignee: Haohui Mai
>         Attachments: dn.log
>
>
> After bringing up a 578 node cluster with 13 dead nodes, 0 were reported on 
> the new GUI, but showed up properly in the datanodes tab.  Some nodes are 
> also being double reported in the deadnode and inservice section (22 show up 
> dead, 565 show up alive, 9 duplicated nodes). 
> From /jmx (confirmed that it's the same in jconsole):
> {noformat}
> {
>     "name" : "Hadoop:service=NameNode,name=FSNamesystemState",
>     "modelerType" : "org.apache.hadoop.hdfs.server.namenode.FSNamesystem",
>     "CapacityTotal" : 5477748687372288,
>     "CapacityUsed" : 24825720407,
>     "CapacityRemaining" : 5477723861651881,
>     "TotalLoad" : 565,
>     "SnapshotStats" : "{\"SnapshottableDirectories\":0,\"Snapshots\":0}",
>     "BlocksTotal" : 21065,
>     "MaxObjects" : 0,
>     "FilesTotal" : 25454,
>     "PendingReplicationBlocks" : 0,
>     "UnderReplicatedBlocks" : 0,
>     "ScheduledReplicationBlocks" : 0,
>     "FSState" : "Operational",
>     "NumLiveDataNodes" : 565,
>     "NumDeadDataNodes" : 0,
>     "NumDecomLiveDataNodes" : 0,
>     "NumDecomDeadDataNodes" : 0,
>     "NumDecommissioningDataNodes" : 0,
>     "NumStaleDataNodes" : 1
>   },
> {noformat}
> I'm not going to include deadnode/livenodes because the list is huge, but 
> I've confirmed there are 9 nodes showing up in both deadnodes and livenodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6180) dead node count / listing is very broken in JMX and old GUI

Reply via email to