Improve NameNode responsiveness by speeding up serving refreshNodes RPC
-----------------------------------------------------------------------
Key: HDFS-1425
URL: https://issues.apache.org/jira/browse/HDFS-1425
Project: Hadoop HDFS
Issue Type: Improvement
Components: name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
Fix For: 0.22.0
Currently when NN serves the refreshNodes request, it examines every block in
the nodes to be decommissioned and put it in the neededReplication queue,
during which a write lock is held.
In our production cluster when decommissioning 100 nodes, we observed that
refreshingNodes took minutes, during the period NameNode became non responsive.
The proposal is that freshNodes only adds the nodes that need to be
decommissioned in a queue in DecommissionManager and then return.
DecommissionManager then takes care of decommissioning the nodes. This would
allow refreshNodes to be returned very quickly.
This also allows us to optimize DecommisionManager. For example, it could sleep
when there is no datanode to be decommissioned while the current code wakes the
thread periodically even when no decommission in progress nodes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.