Decommissioning never ends when node to decommission has blocks that are
under-replicated and cannot be replicated to the expected level of replication
-------------------------------------------------------------------------------------------------------------------------------------------------------
Key: HDFS-1590
URL: https://issues.apache.org/jira/browse/HDFS-1590
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Affects Versions: 0.20.2
Environment: Linux
Reporter: Mathias Herberts
Priority: Minor
On a test cluster with 4 DNs and a default repl level of 3, I recently
attempted to decommission one of the DNs. Right after the modification of the
dfs.hosts.exclude file and the 'dfsadmin -refreshNodes', I could see the blocks
being replicated to other nodes.
After a while, the replication stopped but the node was not marked as
decommissioned.
When running an 'fsck -files -blocks -locations' I saw that all files had a
replication of 4 (which is logical given there are 4 DNs), but some of the
files had an expected replication set to 10 (those were job.jar files from M/R
jobs).
I ran 'fs -setrep 3' on those files and shortly after the namenode reported the
DN as decommissioned.
Shouldn't this case be checked by the NameNode when decommissioning a node? I.e
considere a node decommissioned if either one of the following is true for each
block on the node being decommissioned:
1. It is replicated more than the expected replication level.
2. It is replicated as much as possible given the available nodes, even though
it is less replicated than expected.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.