[
https://issues.apache.org/jira/browse/HDFS-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085577#comment-15085577
]
Kihwal Lee commented on HDFS-9279:
----------------------------------
bq. Because the data present in the decommissioning nodes would eventually be
transferred over to the live nodes. Is this understanding correct?
The replicas are not invalidated on decommissioning nodes even after
replicating, so the capacity tracking was not accurate either. It ended up
double counting the used space toward the end, at which the process seems to
stall more frequently nowadays (this is another topic). If a significant
portion of a cluster is decommissioned, the stat will look very strange and
confuse people. That actually happened to us multiple times. The free/total
ratio will look considerably smaller than the actual value. Monitoring tools
cannot easily dismiss it as 'Nah.. it's a temporary discrepancy caused by
decommissioning.'
With this change, the storage capacity stat has become more like regular
under-replication scenario caused by node/disk outages. Additional space will
be used for re-replicating those blocks, but it is not yet allocated to those
blocks. That's the actual state of used/usable storage and the stat reflects
that now. If we want the stat to reflect what would be used in the future, we
are talking space reservation feature.
> Decomissioned capacity should not be considered for configured/used capacity
> ----------------------------------------------------------------------------
>
> Key: HDFS-9279
> URL: https://issues.apache.org/jira/browse/HDFS-9279
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.6.1
> Reporter: Kuhu Shukla
> Assignee: Kuhu Shukla
> Fix For: 3.0.0, 2.8.0
>
> Attachments: HDFS-9279-v1.patch, HDFS-9279-v2.patch,
> HDFS-9279-v3.patch, HDFS-9279-v4.patch
>
>
> Capacity of a decommissioned node is being accounted as configured and used
> capacity metrics. This gives incorrect perception of cluster usage.
> Once a node is decommissioned, its capacity should be considered similar to a
> dead node.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)