Improve NameNode responsiveness by speeding up serving refreshNodes RPC
-----------------------------------------------------------------------

                 Key: HDFS-1425
                 URL: https://issues.apache.org/jira/browse/HDFS-1425
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: name-node
    Affects Versions: 0.22.0
            Reporter: Hairong Kuang
            Assignee: Hairong Kuang
             Fix For: 0.22.0


Currently when NN serves the refreshNodes request, it examines every block in 
the nodes to be decommissioned and put it in the neededReplication queue, 
during which a write lock is held.

In our production cluster when decommissioning 100 nodes, we observed that 
refreshingNodes took minutes, during the period NameNode became non responsive.

The proposal is that freshNodes only adds the nodes that need to be 
decommissioned in a queue in DecommissionManager and then return. 
DecommissionManager then takes care of decommissioning the nodes. This would 
allow refreshNodes to be returned very quickly.

This also allows us to optimize DecommisionManager. For example, it could sleep 
when there is no datanode to be decommissioned while the current code wakes the 
thread periodically even when no decommission in progress nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to