[
https://issues.apache.org/jira/browse/HDFS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895208#comment-13895208
]
Andrew Ash commented on HDFS-3570:
----------------------------------
Confirmed that this did what I thought it would, and non-DFS used space is
being taken into account. Here are my before and after stats when running with
the default threshold (10%). The delta between overloaded and underloaded
isn't exactly at 10% since there's been more activity since the balancer
finished, but I'm good to go on this.
IP Capacity Used Non DFS used Used % Actual Use %
.33 3.22 0.51 1.39 15.84% 27.87%
.35 3.22 1.87 0.20 58.07% 61.92%
.37 3.22 1.79 0.36 55.59% 62.59%
.39 3.22 1.59 0.33 49.38% 55.02%
.41 3.22 0.18 1.91 5.59% 13.74%
IP Capacity Used Non DFS used Used % Actual Use %
.33 3.22 0.75 1.32 23.29% 39.47%
.35 3.22 1.64 0.17 50.93% 53.77%
.37 3.22 1.55 0.33 48.14% 53.63%
.39 3.22 1.47 0.31 45.65% 50.52%
.41 3.22 0.52 1.90 16.15% 39.39%
Ready for merging!
> Balancer shouldn't rely on "DFS Space Used %" as that ignores non-DFS used
> space
> --------------------------------------------------------------------------------
>
> Key: HDFS-3570
> URL: https://issues.apache.org/jira/browse/HDFS-3570
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: balancer
> Affects Versions: 2.0.0-alpha
> Reporter: Harsh J
> Assignee: Akira AJISAKA
> Priority: Minor
> Attachments: HDFS-3570.2.patch, HDFS-3570.aash.1.patch
>
>
> Report from a user here:
> https://groups.google.com/a/cloudera.org/d/msg/cdh-user/pIhNyDVxdVY/b7ENZmEvBjIJ,
> post archived at http://pastebin.com/eVFkk0A0
> This user had a specific DN that had a large non-DFS usage among
> dfs.data.dirs, and very little DFS usage (which is computed against total
> possible capacity).
> Balancer apparently only looks at the usage, and ignores to consider that
> non-DFS usage may also be high on a DN/cluster. Hence, it thinks that if a
> DFS Usage report from DN is 8% only, its got a lot of free space to write
> more blocks, when that isn't true as shown by the case of this user. It went
> on scheduling writes to the DN to balance it out, but the DN simply can't
> accept any more blocks as a result of its disks' state.
> I think it would be better if we _computed_ the actual utilization based on
> {{(100-(actual remaining space))/(capacity)}}, as opposed to the current
> {{(dfs used)/(capacity)}}. Thoughts?
> This isn't very critical, however, cause it is very rare to see DN space
> being used for non DN data, but it does expose a valid bug.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)