[
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277694#comment-14277694
]
Colin Patrick McCabe commented on HDFS-7411:
--------------------------------------------
bq. This is actually a feature, not a bug Having our own datastructure lets us
speed up decom by only checking blocks that are still insufficiently
replicated. We prune out the sufficient ones each iteration. The memory
overhead here should be pretty small since it's just an 8B reference per block,
so with 1 million blocks this will be 8MB for a single node, or maybe 160MB for
a full rack. Nodes are typically smaller than this this, so these are
conservative estimates, and large decoms aren't that common.
That's a fair point. It's too bad we can't use the existing list for this, but
it's already being re-ordered in the block report processing code, for a
different purpose. I agree that it's fine as-is.
bq. On thinking about it I agree that just using a new config option is fine,
but I'd prefer to define the DecomManager in terms of both an interval and an
amount of work, rather than a rate. This is more powerful, and more in-line
with the existing config. Are you okay with a new blocks.per.interval config?
Why is blocks.per.interval "more powerful" than blocks per minute? It just
seems annoying to have to do the mental math to figure out what to configure to
get a certain blocks per minute going. Also, the fact that "intervals" even
exist is an implementation detail... you could easily imagine an
event-triggered version that didn't do periodic polling. I guess I don't feel
strongly about this, but I'd like to understand the rationale more.
bq. I agree that it can lead to hangs. At a minimum, I'll add a "0 means no
limit" config, and maybe we can set that by default. I think that NNs should
really have enough heap headroom to handle the 10-100 of MBs of memory for
this, it's peanuts compared to the 10s of GBs that are quite typical.
Makes sense.
> Refactor and improve decommissioning logic into DecommissionManager
> -------------------------------------------------------------------
>
> Key: HDFS-7411
> URL: https://issues.apache.org/jira/browse/HDFS-7411
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 2.5.1
> Reporter: Andrew Wang
> Assignee: Andrew Wang
> Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch,
> hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch,
> hdfs-7411.006.patch
>
>
> Would be nice to split out decommission logic from DatanodeManager to
> DecommissionManager.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)