Improve NameNode responsiveness by speeding up serving refreshNodes RPC -----------------------------------------------------------------------
Key: HDFS-1425 URL: https://issues.apache.org/jira/browse/HDFS-1425 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.22.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Currently when NN serves the refreshNodes request, it examines every block in the nodes to be decommissioned and put it in the neededReplication queue, during which a write lock is held. In our production cluster when decommissioning 100 nodes, we observed that refreshingNodes took minutes, during the period NameNode became non responsive. The proposal is that freshNodes only adds the nodes that need to be decommissioned in a queue in DecommissionManager and then return. DecommissionManager then takes care of decommissioning the nodes. This would allow refreshNodes to be returned very quickly. This also allows us to optimize DecommisionManager. For example, it could sleep when there is no datanode to be decommissioned while the current code wakes the thread periodically even when no decommission in progress nodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.