Large number of decommission freezes the Namenode
-------------------------------------------------
Key: HADOOP-4061
URL: https://issues.apache.org/jira/browse/HADOOP-4061
Project: Hadoop Core
Issue Type: Bug
Components: dfs
Affects Versions: 0.17.2
Reporter: Koji Noguchi
On 1900 nodes cluster, we tried decommissioning 400 nodes with 30k blocks each.
Other 1500 nodes were almost empty.
When decommission started, namenode's queue overflowed every 6 minutes.
Looking at the cpu usage, it showed that every 5 minutes
org.apache.hadoop.dfs.FSNamesystem$DecommissionedMonitor thread was taking 100%
of the CPU for 1 minute causing the queue to overflow.
{noformat}
public synchronized void decommissionedDatanodeCheck() {
for (Iterator<DatanodeDescriptor> it = datanodeMap.values().iterator();
it.hasNext();) {
DatanodeDescriptor node = it.next();
checkDecommissionStateInternal(node);
}
}
{noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.