[
https://issues.apache.org/jira/browse/HADOOP-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649468#action_12649468
]
Konstantin Shvachko commented on HADOOP-4061:
---------------------------------------------
- You changed the time units in {{dfs.namenode.decommission.interval}} from
minutes to seconds. Is it going to be a problem for those who use this config
variable? If they set it to 5 (now in minutes) then it is going to be every 5
seconds after your patch.
- Do we want to introduce {{DecommissionMonitor decommissionManager}} member in
{{FSNamesystem}}? Then we will be able to move all decommissioning logic into
{{DecommissionMonitor}} or manager, which is probably partly a goal of this
patch.
-- {{FSNamesystem.checkDecommissionStateInternal()}} should be moved in to
{{DecommissionMonitor}};
-- same as {{startDecommission()}} and {{stopDecommission()}}.
- In {{isReplicationInProgress()}} could you please rename
{{decommissionBlocks}} to {{nodeBlocks}}. It has nothing to do with
decommission and is confising.
I think this throttling approach will solve the problem for now, but is not
ideal. Say, if you have 500,000 blocks rather than 30,000 then you will have to
reconfigure the throttler to scan even less nodes.
Deleting already decommissioned blocks as Raghu proposes is also not very good.
Until the node is shut down its blocks can be accessed for read. We don't want
to change that.
I would rather go with the approach, which counts down decommissioned blocks as
they are replicated. Then there is no need to scan all blocks to verify the
node is decommissioned, just check the counter. We can add the total block scan
as a sanity check in stopDecommission(). The counter can also be a good
indicator of how much decommissioning progress has been done at every moment.
We should create a separate jira for these changes.
> Large number of decommission freezes the Namenode
> -------------------------------------------------
>
> Key: HADOOP-4061
> URL: https://issues.apache.org/jira/browse/HADOOP-4061
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.17.2
> Reporter: Koji Noguchi
> Assignee: Tsz Wo (Nicholas), SZE
> Attachments: 4061_20081119.patch
>
>
> On 1900 nodes cluster, we tried decommissioning 400 nodes with 30k blocks
> each. Other 1500 nodes were almost empty.
> When decommission started, namenode's queue overflowed every 6 minutes.
> Looking at the cpu usage, it showed that every 5 minutes
> org.apache.hadoop.dfs.FSNamesystem$DecommissionedMonitor thread was taking
> 100% of the CPU for 1 minute causing the queue to overflow.
> {noformat}
> public synchronized void decommissionedDatanodeCheck() {
> for (Iterator<DatanodeDescriptor> it = datanodeMap.values().iterator();
> it.hasNext();) {
> DatanodeDescriptor node = it.next();
> checkDecommissionStateInternal(node);
> }
> }
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.