[
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277593#comment-14277593
]
Colin Patrick McCabe commented on HDFS-7411:
--------------------------------------------
DecomissionManager:
Let's document the locking here. I believe all of these functions are designed
to be run with the FSNamesystem write lock held, correct?
A question about the data structures here. We already have a way of iterating
through all the blocks on a given {{DataNode}} via the implicit linked lists in
the {{BlockInfo}} objects. Shouldn't we be using that here, rather than
creating our own list in {{decomNodeBlocks}}? This could be a lot of memory
saved here, since datanodes might have anywhere between 200k and a million
blocks in the next few years. (I am fine with doing this in a follow-on, of
course.)
{{DecomissionManager#startDecommission}}: let's have a debug log message here
for the case where the node is already decommissioned... it might help with
debugging. Similarly in {{stopDecomission}}... if we're not doing anything,
let's log why we're not doing anything at debug or trace level.
Config Keys:
We talked about this a bit earlier but didn't really come to any consensus. I
think we should just get rid of
{{dfs.namenode.decommission.nodes.per.interval}} and
{{dfs.namenode.decommission.blocks.per.node}}, and just have a configuration
key like {{dfs.namenode.decommission.blocks.per.minute}} that expresses
directly what we want.
If we set a reasonable default for
{{dfs.namenode.decommission.blocks.per.minute}}, the impact on users will be
minimal. The old rate-limiting process was broken anyway... that's a big part
of what this patch is designed to fix, as I understand. So we shouldn't need
to lug around these old config keys that don't really express what we're trying
to configure. Let's just add a new configuration key that is easy to
understand, and maintain in the future. Decom is a manual process anywhere
where admins get involved
{{dfs.namenode.decommission.max.concurrent.tracked.nodes}}: I have mixed
feelings about this configuration key. It seems like the reason you want to
add it is to limit NN memory consumption (but you can do that by not
duplicating data structures-- see above). However, it may have some value for
users who would rather finish decomissioning 100 nodes in an hour, than have
1000 nodes 10% decomissioned in that time. So I guess it is a good addition,
maybe? The only problem is that if some of those first 100 nodes get stuck
because there is an open-for-write file or something, then the decom process
will start to slow down and perhaps eventually hang.
On a related note, I feel like we should have a follow-up change to make the
decom parameters reconfigurable via the configuration reloading interface we
added recently. I will file a follow-on JIRA for that.
{code}
if (LOG.isDebugEnabled()) {
StringBuilder b = new StringBuilder("Node {} ");
if (isHealthy) {
b.append("is ");
} else {
b.append("isn't ");
}
b.append("and still needs to replicate {} more blocks, " +
"decommissioning is still in progress.");
{code}
Missing "healthy " in the printout.
Can we log this at info level like every half hour or something, so that people
can see issues with nodes getting "stuck"? As it is, it seems like they'll get
no output unless they monkey with log4j manually.
{code}
* Note also that the reference to the list of under-replicated blocks
* will be null on initial addx
{code}
Spelling?
{code}
final Iterator<Map.Entry<DatanodeDescriptor, AbstractList<BlockInfo>>>
it =
new CyclicIteration<DatanodeDescriptor, AbstractList<BlockInfo>>(
decomNodeBlocks, iterkey).iterator();
{code}
Well, this is creative, but I think I'd rather have the standard indentation :)
> Refactor and improve decommissioning logic into DecommissionManager
> -------------------------------------------------------------------
>
> Key: HDFS-7411
> URL: https://issues.apache.org/jira/browse/HDFS-7411
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 2.5.1
> Reporter: Andrew Wang
> Assignee: Andrew Wang
> Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch,
> hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch,
> hdfs-7411.006.patch
>
>
> Would be nice to split out decommission logic from DatanodeManager to
> DecommissionManager.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)