[ 
https://issues.apache.org/jira/browse/HADOOP-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649468#action_12649468
 ] 

Konstantin Shvachko commented on HADOOP-4061:
---------------------------------------------

- You changed the time units in {{dfs.namenode.decommission.interval}} from 
minutes to seconds. Is it going to be a problem for those who use this config 
variable? If they set it to 5 (now in minutes) then it is going to be every 5 
seconds after your patch.
- Do we want to introduce {{DecommissionMonitor decommissionManager}} member in 
{{FSNamesystem}}? Then we will be able to move all decommissioning logic into 
{{DecommissionMonitor}} or manager, which is probably partly a goal of this 
patch. 
-- {{FSNamesystem.checkDecommissionStateInternal()}} should be moved in to 
{{DecommissionMonitor}};
-- same as {{startDecommission()}} and {{stopDecommission()}}.
- In {{isReplicationInProgress()}} could you please rename  
{{decommissionBlocks}} to {{nodeBlocks}}. It has nothing to do with 
decommission and is confising.

I think this throttling approach will solve the problem for now, but is not 
ideal. Say, if you have 500,000 blocks rather than 30,000 then you will have to 
reconfigure the throttler to scan even less nodes. 
Deleting already decommissioned blocks as Raghu proposes is also not very good. 
Until the node is shut down its blocks can be  accessed for read. We don't want 
to change that.
I would rather go with the approach, which counts down decommissioned blocks as 
they are replicated. Then there is no need to scan all blocks to verify the 
node is decommissioned, just check the counter. We can add the total block scan 
as a sanity check in stopDecommission(). The counter can also be a good 
indicator of how much decommissioning progress has been done at every moment.
We should create a separate jira for these changes.

> Large number of decommission freezes the Namenode
> -------------------------------------------------
>
>                 Key: HADOOP-4061
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4061
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.2
>            Reporter: Koji Noguchi
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: 4061_20081119.patch
>
>
> On 1900 nodes cluster, we tried decommissioning 400 nodes with 30k blocks 
> each. Other 1500 nodes were almost empty.
> When decommission started, namenode's queue overflowed every 6 minutes.
> Looking at the cpu usage,  it showed that every 5 minutes 
> org.apache.hadoop.dfs.FSNamesystem$DecommissionedMonitor thread was taking 
> 100% of the CPU for 1 minute causing the queue to overflow.
> {noformat}
>   public synchronized void decommissionedDatanodeCheck() {
>     for (Iterator<DatanodeDescriptor> it = datanodeMap.values().iterator();
>          it.hasNext();) {
>       DatanodeDescriptor node = it.next();
>       checkDecommissionStateInternal(node);
>     }
>   }
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to