[
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277677#comment-14277677
]
Andrew Wang commented on HDFS-7411:
-----------------------------------
Hi Colin, thanks for reviewing. I'll rework the patch after we settle on the
details, few replies:
bq. Shouldn't we be using that here, rather than creating our own list in
decomNodeBlocks?
This is actually a feature, not a bug :) Having our own datastructure lets us
speed up decom by only checking blocks that are still insufficiently
replicated. We prune out the sufficient ones each iteration. The memory
overhead here should be pretty small since it's just an 8B reference per block,
so with 1 million blocks this will be 8MB for a single node, or maybe 160MB for
a full rack. Nodes are typically smaller than this this, so these are
conservative estimates, and large decoms aren't that common.
The one thing I could see as a nice improvement is that we could skip the final
full scan at the end of decom if we immediately propagate block map changes to
decomNodeBlocks, but that seems like more trouble than it's worth.
bq. have a configuration key like dfs.namenode.decommission.blocks.per.minute
that expresses directly what we want.
On thinking about it I agree that just using a new config option is fine, but
I'd prefer to define the DecomManager in terms of both an interval and an
amount of work, rather than a rate. This is more powerful, and more in-line
with the existing config. Are you okay with a new {{blocks.per.interval}}
config?
bq. dfs.namenode.decommission.max.concurrent.tracked.nodes
I agree that it can lead to hangs. At a minimum, I'll add a "0 means no limit"
config, and maybe we can set that by default. I think that NNs should really
have enough heap headroom to handle the 10-100 of MBs of memory for this, it's
peanuts compared to the 10s of GBs that are quite typical.
> Refactor and improve decommissioning logic into DecommissionManager
> -------------------------------------------------------------------
>
> Key: HDFS-7411
> URL: https://issues.apache.org/jira/browse/HDFS-7411
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 2.5.1
> Reporter: Andrew Wang
> Assignee: Andrew Wang
> Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch,
> hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch,
> hdfs-7411.006.patch
>
>
> Would be nice to split out decommission logic from DatanodeManager to
> DecommissionManager.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)