[jira] Updated: (HADOOP-4061) Large number of decommission freezes the Namenode

Tsz Wo (Nicholas), SZE (JIRA) Sun, 23 Nov 2008 01:12:08 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tsz Wo (Nicholas), SZE updated HADOOP-4061:
-------------------------------------------

    Attachment: 4061_20081123.patch

bq. I propose to store last checked DatanodeDescriptor lastDN in 
DatanodeManager and get a datanodeMap.tail(lastDN, nonInclusive) on each 
iteration. The tail map can be used instead of the queue, since it has 
analogous methods getFirstEntry() and pollFirstEntry().

We can't poll elements from the tail map since the tail map is backed by the 
original map.  Polling elements from the tail map changes the original map.

We don't really need tail map.  We need a cyclic iterator so that it starts an 
iteration from any point of the map and follows the map's ordering.  If the 
iterator hits the last entry of the map, it will then continue from the first 
entry.

4061_20081123.patch: implement cyclic iterator

> Large number of decommission freezes the Namenode
> -------------------------------------------------
>
>                 Key: HADOOP-4061
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4061
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.2
>            Reporter: Koji Noguchi
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: 4061_20081119.patch, 4061_20081120.patch, 
> 4061_20081120b.patch, 4061_20081123.patch
>
>
> On 1900 nodes cluster, we tried decommissioning 400 nodes with 30k blocks 
> each. Other 1500 nodes were almost empty.
> When decommission started, namenode's queue overflowed every 6 minutes.
> Looking at the cpu usage,  it showed that every 5 minutes 
> org.apache.hadoop.dfs.FSNamesystem$DecommissionedMonitor thread was taking 
> 100% of the CPU for 1 minute causing the queue to overflow.
> {noformat}
>   public synchronized void decommissionedDatanodeCheck() {
>     for (Iterator<DatanodeDescriptor> it = datanodeMap.values().iterator();
>          it.hasNext();) {
>       DatanodeDescriptor node = it.next();
>       checkDecommissionStateInternal(node);
>     }
>   }
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4061) Large number of decommission freezes the Namenode

Reply via email to