[
https://issues.apache.org/jira/browse/HDFS-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204308#comment-13204308
]
Todd Lipcon commented on HDFS-2510:
-----------------------------------
Sorry, missed the comment above:
{quote}
Similarly, I couldn't think of anything useful an operator could get from this.
It also doesn't help the situation that currently all DN metrics are
per-DN-daemon, not per BP offer service. Thus, it's not obvious how to get
meaningful DN-side metrics for just a single namespace.
{quote}
I think a useful metric which could be exposed is {{max(time since last
successful communication)}}. This would help diagnose if one of the racks gets
partitioned off from one of the NNs, for example -- all of the DNs in that rack
would start to rise in this metric.
That said, the ones you've implemented here are fine and the most crucial, so
+1 to the current patch and we can discuss adding some more DN-side metrics
separately.
> Add HA-related metrics
> ----------------------
>
> Key: HDFS-2510
> URL: https://issues.apache.org/jira/browse/HDFS-2510
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: data-node, ha, name-node
> Affects Versions: HA branch (HDFS-1623)
> Reporter: Aaron T. Myers
> Assignee: Aaron T. Myers
> Attachments: HDFS-2510-HDFS-1623.patch, HDFS-2510.HDFS-1623.patch
>
>
> Off the top of my head, I can think of:
> NN metrics:
> * A binary metric for active or standby
> * The size of the pending DN message queues
> * A timestamp for when the standby NN last read from shared edit log
> * The difference between highest generation stamp seen from the shared edit
> log and the highest generation stamp seen from any DN
> It would probably also be useful to have a DN metric which somehow describes
> which active/standby NNs its talking to, e.g. "times since last communicated
> with standby/active NNs."
> I'm sure there are others as well. Comments strongly encouraged.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira