[
https://issues.apache.org/jira/browse/HDFS-14045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16678463#comment-16678463
]
Erik Krogen commented on HDFS-14045:
------------------------------------
{quote}
but I think grouping metrics by role of NN is more better than by NameNode ID,
because we can not know which metric is Active/Standy/Observer from name of
metric.
{quote}
Fair point. I see this as a tradeoff of ease of use vs. the amount of
information available; personally I prefer having more information available
even if it is a bit harder to interpret, but I definitely respect your
disagreement. I would be interested to see how others feel.
{quote}
Same feeling. But I doubt if we achieve that under current metrics framework.
{quote}
If you're worried about whether or not it's possible to create metric names
with dynamic names, it is definitely possible. For example, see the work I did
in HDFS-10872. Though in this case it is actually probably better use to
{{MetricTag}}, as is used, for example, by the IPC server metrics
({{RpcMetrics}}) to have metrics differentiated by which port the server is
running on.
This makes me think: one possibility would be to add both a {{MetricTag}} with
the NameNode ID and one with the state, and an operator can decide how to
interpret / separate out the tags on the metrics at a later time. Not sure if
this would be too overkill.
{quote}
On the other side, can we add similar metrics at NN side to measure the latency
of these RPC calls?
{quote}
They'll be covered by the {{RpcMetrics}} for the NameNode service port, but it
can also be really useful to have metrics measured at client-side rather than
the server-side. For example, this can be used to detect issues such as those
discussed in HADOOP-14031.
> Use different metrics in DataNode to better measure latency of
> heartbeat/blockReports/incrementalBlockReports of Active/Standby NN
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-14045
> URL: https://issues.apache.org/jira/browse/HDFS-14045
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Reporter: Jiandan Yang
> Assignee: Jiandan Yang
> Priority: Major
> Attachments: HDFS-14045.001.patch, HDFS-14045.002.patch,
> HDFS-14045.003.patch, HDFS-14045.004.patch
>
>
> Currently DataNode uses same metrics to measure rpc latency of NameNode, but
> Active and Standby usually have different performance at the same time,
> especially in large cluster. For example, rpc latency of Standby is very long
> when Standby is catching up editlog. We may misunderstand the state of HDFS.
> Using different metrics for Active and standby can help us obtain more
> precise metric data.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]