[ 
https://issues.apache.org/jira/browse/HDFS-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-5757:
--------------------------
    Attachment: HDFS-5757.patch

Here is the rough estimate of how long "dfsadmin -refreshNodes" could take up 
the namenode FSNamesystem lock. If we assume 100k blocks for each DN, and it 
takes 0.2s to walk through all the blocks of each DN. Calling "dfsadmin 
-refreshNodes" with 50 nodes at once could lock up NN for 10s.

Admins can choose to call "dfsadmin -refreshNodes" one node at a time to work 
around the issue. So it isn't a big deal.

Still it might be better to guard NN against it. The patch removes the 
expensive {{isReplicationInProgress}} call from refreshNodes RPC thread path; 
thus makes refreshNodes return quickly without holding FSNamesystemm lock for 
too long. Instead, it queus the requests so that DecommissionManager will take 
care of the rest. Also fix the case for recommission scenario as well.

Unit tests will be added later.

> Decommisson lots of nodes at the same time could slow down NN
> -------------------------------------------------------------
>
>                 Key: HDFS-5757
>                 URL: https://issues.apache.org/jira/browse/HDFS-5757
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Ming Ma
>         Attachments: HDFS-5757.patch
>
>
> Sometimes we need to decomm a whole rack of nodes at the same time. When the 
> decomm is in process; NN is slow.
> The reason is when DecommissionManager checks the decomm status, it acquires 
> namesystem's writer lock and iterates through all DNs; for each DN that is in 
> decommissioning state, it check if replication is done for all the blocks on 
> the machine via blockManager.isReplicationInProgress; for large cluster; the 
> number of blocks on the machine could be big.
> The fix could be to have DecommissionManager check for several 
> decomm-in-progress nodes each time it aquires namesystem's writer lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to