[
https://issues.apache.org/jira/browse/HDFS-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393437#comment-14393437
]
Nathan Roberts commented on HDFS-8041:
--------------------------------------
Hi [~kihwal]. Some minor comments on the patch
+ Can we bounds check the new config? I think it works fine even without it but
just to be safe against a change to the algorithm in the future.
+ I wish there was a way to make this config refreshable. Unfortunately I don't
think that's possible today.
+ Should we protect against stats.getNumDatanodesInService being 0. Again,
probably ok as it is today but just to avoid a future patch from breaking the
assumptions.
+ Node local writes are not impacted by the change. Maybe we should also have
rack-local writes avoid this check so that the 2nd and 3rd replicas remain in
the same rack. I think just having this impact the completely random target
selections might be enough to avoid the problem while minimizing the affects on
block placement.
> Consider remaining space during block blockplacement if dfs space is highly
> utilized
> ------------------------------------------------------------------------------------
>
> Key: HDFS-8041
> URL: https://issues.apache.org/jira/browse/HDFS-8041
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Kihwal Lee
> Assignee: Kihwal Lee
> Attachments: HDFS-8041.v1.patch, HDFS-8041.v2.patch
>
>
> This feature is helpful in avoiding smaller nodes (i.e. heterogeneous
> environment) getting constantly being full when the overall space utilization
> is over a certain threshold. When the utilization is low, balancer can keep
> up, but once the average per-node byte goes over the capacity of the smaller
> nodes, they get full so quickly even after perfect balance.
> This jira proposes an improvement that can be optionally enabled in order to
> slow down the rate of space usage growth of smaller nodes if the overall
> storage utilization is over a configured threshold. It will not replace
> balancer, rather will help balancer keep up. Also, the primary replica
> placement will not be affected. Only the replicas typically placed in a
> remote rack will be subject to this check.
> The appropriate threshold is cluster configuration specific. There is no
> generally good value to set, thus it is disabled by default. We have seen
> cases where the threshold of 85% - 90% would help. Figuring when
> {{totalSpaceUsed / numNodes}} becomes close to the capacity of a smaller node
> is helpful in determining the threshold.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)