[ https://issues.apache.org/jira/browse/HADOOP-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tsz Wo (Nicholas), SZE updated HADOOP-4061: ------------------------------------------- Fix Version/s: 0.20.0 0.19.1 0.18.3 Hadoop Flags: [Incompatible change, Reviewed] (was: [Reviewed, Incompatible change]) > Large number of decommission freezes the Namenode > ------------------------------------------------- > > Key: HADOOP-4061 > URL: https://issues.apache.org/jira/browse/HADOOP-4061 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.17.2 > Reporter: Koji Noguchi > Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.18.3, 0.19.1, 0.20.0 > > Attachments: 4061_20081119.patch, 4061_20081120.patch, > 4061_20081120b.patch, 4061_20081123.patch, 4061_20081124.patch, > 4061_20081124b.patch, 4061_20081124c.patch, 4061_20081124c_0.18.patch, > HADOOP-4061.patch > > > On 1900 nodes cluster, we tried decommissioning 400 nodes with 30k blocks > each. Other 1500 nodes were almost empty. > When decommission started, namenode's queue overflowed every 6 minutes. > Looking at the cpu usage, it showed that every 5 minutes > org.apache.hadoop.dfs.FSNamesystem$DecommissionedMonitor thread was taking > 100% of the CPU for 1 minute causing the queue to overflow. > {noformat} > public synchronized void decommissionedDatanodeCheck() { > for (Iterator<DatanodeDescriptor> it = datanodeMap.values().iterator(); > it.hasNext();) { > DatanodeDescriptor node = it.next(); > checkDecommissionStateInternal(node); > } > } > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.