[
https://issues.apache.org/jira/browse/HDFS-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054175#comment-13054175
]
Allen Wittenauer commented on HDFS-1125:
----------------------------------------
The problem still seems to be present in 0.20.203, so I'm guessing no, the
problem hasn't been fixed by HDFS-1773.
How I tested:
a) create a grid with 203, filling in dfs.hosts
b) populate it with data
c) put host in dfs.exclude
d) -refreshNodes, verify host is in decom'ing nodes
e) let decom process finish
f) host now shows up in dead
g) remove host from dfs.host and dfs.exclude
h) -refreshNodes
i) node is still listed as dead by nn
j) kill DataNode process
k) node is still listed as dead by nn
l) 10 mins later, still listed...
> Removing a datanode (failed or decommissioned) should not require a namenode
> restart
> ------------------------------------------------------------------------------------
>
> Key: HDFS-1125
> URL: https://issues.apache.org/jira/browse/HDFS-1125
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: name-node
> Affects Versions: 0.20.2
> Reporter: Alex Loddengaard
> Priority: Blocker
>
> I've heard of several Hadoop users using dfsadmin -report to monitor the
> number of dead nodes, and alert if that number is not 0. This mechanism
> tends to work pretty well, except when a node is decommissioned or fails,
> because then the namenode requires a restart for said node to be entirely
> removed from HDFS. More details here:
> http://markmail.org/search/?q=decommissioned%20node%20showing%20up%20ad%20dead%20node%20in%20web%20based%09interface%20to%20namenode#query:decommissioned%20node%20showing%20up%20ad%20dead%20node%20in%20web%20based%09interface%20to%20namenode+page:1+mid:7gwqwdkobgfuszb4+state:results
> Removal from the exclude file and a refresh should get rid of the dead node.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira