[
https://issues.apache.org/jira/browse/HDFS-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977632#comment-13977632
]
Ming Ma commented on HDFS-5757:
-------------------------------
It seems property dfs.namenode.decommission.nodes.per.interval already support
this, you can configure how many DNs to check each time DecommissionManager
takes the writeLock. Perhaps we can do something similar to DataNodeManager's
refreshNodes.
DataNodeManager's refreshNodes takes the writeLock when it kicks off the
decommission process. We can modify refreshNodes to kick off several nodes each
time writeLock is acquired. refreshNodes can be modified to return the RPC
request quickly without waiting for DataNodeManager to finish the process.
There are other less important scenarios, a. if a machine has lots of blocks,
this could still hold NN's writeLock for sometime. b. Even when there is
nothing being decommissioned, DecommissionManager still takes writeLock and
wall through all DNs.
> Decommisson lots of nodes at the same time could slow down NN
> -------------------------------------------------------------
>
> Key: HDFS-5757
> URL: https://issues.apache.org/jira/browse/HDFS-5757
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Reporter: Ming Ma
>
> Sometimes we need to decomm a whole rack of nodes at the same time. When the
> decomm is in process; NN is slow.
> The reason is when DecommissionManager checks the decomm status, it acquires
> namesystem's writer lock and iterates through all DNs; for each DN that is in
> decommissioning state, it check if replication is done for all the blocks on
> the machine via blockManager.isReplicationInProgress; for large cluster; the
> number of blocks on the machine could be big.
> The fix could be to have DecommissionManager check for several
> decomm-in-progress nodes each time it aquires namesystem's writer lock.
--
This message was sent by Atlassian JIRA
(v6.2#6252)