[ 
https://issues.apache.org/jira/browse/HDFS-14045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16678463#comment-16678463
 ] 

Erik Krogen commented on HDFS-14045:
------------------------------------

{quote}
but I think grouping metrics by role of NN is more better than by NameNode ID, 
because we can not know which metric is Active/Standy/Observer from name of 
metric.
{quote}
Fair point. I see this as a tradeoff of ease of use vs. the amount of 
information available; personally I prefer having more information available 
even if it is a bit harder to interpret, but I definitely respect your 
disagreement. I would be interested to see how others feel.

{quote}
Same feeling. But I doubt if we achieve that under current metrics framework.
{quote}
If you're worried about whether or not it's possible to create metric names 
with dynamic names, it is definitely possible. For example, see the work I did 
in HDFS-10872. Though in this case it is actually probably better use to 
{{MetricTag}}, as is used, for example, by the IPC server metrics 
({{RpcMetrics}}) to have metrics differentiated by which port the server is 
running on.

This makes me think: one possibility would be to add both a {{MetricTag}} with 
the NameNode ID and one with the state, and an operator can decide how to 
interpret / separate out the tags on the metrics at a later time. Not sure if 
this would be too overkill.

{quote}
On the other side, can we add similar metrics at NN side to measure the latency 
of these RPC calls? 
{quote}
They'll be covered by the {{RpcMetrics}} for the NameNode service port, but it 
can also be really useful to have metrics measured at client-side rather than 
the server-side. For example, this can be used to detect issues such as those 
discussed in HADOOP-14031.

> Use different metrics in DataNode to better measure latency of 
> heartbeat/blockReports/incrementalBlockReports of Active/Standby NN
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-14045
>                 URL: https://issues.apache.org/jira/browse/HDFS-14045
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Jiandan Yang 
>            Assignee: Jiandan Yang 
>            Priority: Major
>         Attachments: HDFS-14045.001.patch, HDFS-14045.002.patch, 
> HDFS-14045.003.patch, HDFS-14045.004.patch
>
>
> Currently DataNode uses same metrics to measure rpc latency of NameNode, but 
> Active and Standby usually have different performance at the same time, 
> especially in large cluster. For example, rpc latency of Standby is very long 
> when Standby is catching up editlog. We may misunderstand the state of HDFS. 
> Using different metrics for Active and standby can help us obtain more 
> precise metric data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to