[
https://issues.apache.org/jira/browse/HDFS-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711216#comment-16711216
]
Abhishek Modi commented on HDFS-1590:
-------------------------------------
Right now, in HDFS it checks whether it's possible to achieve default
replication count of the block before marking the node as decommissioned. So,
even if a single block has default replication of 50, hdfs wouldn't allow
decommissioning of nodes less than 50.
IMO, should we introduce a different config for replication count to be
maintained when decommissioning the node. Or should we use min replication
count check for decommissioning check, although this could be risky.
> Decommissioning never ends when node to decommission has blocks that are
> under-replicated and cannot be replicated to the expected level of replication
> -------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-1590
> URL: https://issues.apache.org/jira/browse/HDFS-1590
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 0.20.2
> Environment: Linux
> Reporter: Mathias Herberts
> Priority: Minor
>
> On a test cluster with 4 DNs and a default repl level of 3, I recently
> attempted to decommission one of the DNs. Right after the modification of the
> dfs.hosts.exclude file and the 'dfsadmin -refreshNodes', I could see the
> blocks being replicated to other nodes.
> After a while, the replication stopped but the node was not marked as
> decommissioned.
> When running an 'fsck -files -blocks -locations' I saw that all files had a
> replication of 4 (which is logical given there are 4 DNs), but some of the
> files had an expected replication set to 10 (those were job.jar files from
> M/R jobs).
> I ran 'fs -setrep 3' on those files and shortly after the namenode reported
> the DN as decommissioned.
> Shouldn't this case be checked by the NameNode when decommissioning a node?
> I.e considere a node decommissioned if either one of the following is true
> for each block on the node being decommissioned:
> 1. It is replicated more than the expected replication level.
> 2. It is replicated as much as possible given the available nodes, even
> though it is less replicated than expected.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]