[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

Andrew Wang (JIRA) Wed, 14 Jan 2015 13:05:55 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277677#comment-14277677
 ]


Andrew Wang commented on HDFS-7411:
-----------------------------------

Hi Colin, thanks for reviewing. I'll rework the patch after we settle on the 
details, few replies:

bq. Shouldn't we be using that here, rather than creating our own list in 
decomNodeBlocks?

This is actually a feature, not a bug :) Having our own datastructure lets us 
speed up decom by only checking blocks that are still insufficiently 
replicated. We prune out the sufficient ones each iteration. The memory 
overhead here should be pretty small since it's just an 8B reference per block, 
so with 1 million blocks this will be 8MB for a single node, or maybe 160MB for 
a full rack. Nodes are typically smaller than this this, so these are 
conservative estimates, and large decoms aren't that common.

The one thing I could see as a nice improvement is that we could skip the final 
full scan at the end of decom if we immediately propagate block map changes to 
decomNodeBlocks, but that seems like more trouble than it's worth.

bq. have a configuration key like dfs.namenode.decommission.blocks.per.minute 
that expresses directly what we want.

On thinking about it I agree that just using a new config option is fine, but 
I'd prefer to define the DecomManager in terms of both an interval and an 
amount of work, rather than a rate. This is more powerful, and more in-line 
with the existing config. Are you okay with a new {{blocks.per.interval}} 
config?

bq. dfs.namenode.decommission.max.concurrent.tracked.nodes

I agree that it can lead to hangs. At a minimum, I'll add a "0 means no limit" 
config, and maybe we can set that by default. I think that NNs should really 
have enough heap headroom to handle the 10-100 of MBs of memory for this, it's 
peanuts compared to the 10s of GBs that are quite typical.

> Refactor and improve decommissioning logic into DecommissionManager
> -------------------------------------------------------------------
>
>                 Key: HDFS-7411
>                 URL: https://issues.apache.org/jira/browse/HDFS-7411
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.5.1
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
> hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
> hdfs-7411.006.patch
>
>
> Would be nice to split out decommission logic from DatanodeManager to 
> DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

Reply via email to