[
https://issues.apache.org/jira/browse/HDFS-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aaron T. Myers updated HDFS-2510:
---------------------------------
Attachment: HDFS-2510.HDFS-1623.patch
Here's a patch which addresses the issue. In addition to the provided test, I
also tested this manually on a cluster by hitting the /jmx URL and observing
the values shown there for the new metrics.
I implemented all the metrics above, except for the following:
bq. The difference between highest generation stamp seen from the shared edit
log and the highest generation stamp seen from any DN
I couldn't think of any legitimate use for this. It seems to serve only as a
proxy for the size of the pending DN message queues.
bq. It would probably also be useful to have a DN metric which somehow
describes which active/standby NNs its talking to, e.g. "times since last
communicated with standby/active NNs."
Similarly, I couldn't think of anything useful an operator could get from this.
It also doesn't help the situation that currently all DN metrics are
per-DN-daemon, not per BP offer service. Thus, it's not obvious how to get
meaningful DN-side metrics for just a single namespace.
I'm certainly open to suggestions for other metrics that people think might be
useful.
> Add HA-related metrics
> ----------------------
>
> Key: HDFS-2510
> URL: https://issues.apache.org/jira/browse/HDFS-2510
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: data-node, ha, name-node
> Affects Versions: HA branch (HDFS-1623)
> Reporter: Aaron T. Myers
> Assignee: Aaron T. Myers
> Attachments: HDFS-2510.HDFS-1623.patch
>
>
> Off the top of my head, I can think of:
> NN metrics:
> * A binary metric for active or standby
> * The size of the pending DN message queues
> * A timestamp for when the standby NN last read from shared edit log
> * The difference between highest generation stamp seen from the shared edit
> log and the highest generation stamp seen from any DN
> It would probably also be useful to have a DN metric which somehow describes
> which active/standby NNs its talking to, e.g. "times since last communicated
> with standby/active NNs."
> I'm sure there are others as well. Comments strongly encouraged.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira