[
https://issues.apache.org/jira/browse/HDFS-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348563#comment-15348563
]
Zhe Zhang commented on HDFS-10534:
----------------------------------
Thanks Andrew. I just reverted the change.
bq. Why not present a histogram rather than a single threshold like this? That
way we don't add a new config, present more info, and don't require a restart
to change this threshold.
In our case we are mostly interested in the 95th percentile because it serves
as an alarm that 5% DNs are becoming hot nodes and will likely cause job
failures. A histogram is a nice idea actually. We can think about an
appropriate granularity (e.g. every 5%?) for it. The only drawback is that it
will add more content to NN web UI and make it busier -- I imagine it will a
table.
bq. This is also a metric that could be calculated in client-side JS from
existing information.
True. But I think showing on NN web UI is more convenient for admins. We
proposed the change because median (50th percentile) is actually a poor metric
to illustrate imbalance level; especially in a busy cluster with say > 70%
overall utilization. We therefore wanted a "better median".
bq. the config says it's a percentile, but it's really a quantile.
Good catch. We could change the config to be a real percentile to be b/w 0 and
100. Per above, we could also show a histogram instead.
So overall I like the histogram idea. [~lewuathe] What are you thoughts?
> NameNode WebUI should display DataNode usage rate with a certain percentile
> ---------------------------------------------------------------------------
>
> Key: HDFS-10534
> URL: https://issues.apache.org/jira/browse/HDFS-10534
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode, ui
> Reporter: Zhe Zhang
> Assignee: Kai Sasaki
> Attachments: HDFS-10534.01.patch, HDFS-10534.02.patch,
> HDFS-10534.03.patch, HDFS-10534.04.patch, HDFS-10534.05.patch, Screen Shot
> 2016-06-23 at 6.25.50 AM.png
>
>
> In addition of *Min/Median/Max*, another meaningful metric for cluster
> balance is DN usage rate at a certain percentile (e.g. 90 or 95). We should
> add a config option, and another filed on NN WebUI, to display this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]