[
https://issues.apache.org/jira/browse/HDFS-14045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16681592#comment-16681592
]
Erik Krogen commented on HDFS-14045:
------------------------------------
Cool, the new changes LGTM. A few additional comments:
* Can we change the same of the method/parameter to something indicating it is
for metrics only, maybe like {{nnLatencyMetricsSuffix}}? It looks particularly
odd to me in {{IncrementalBlockReportManager}} right now.
* I think I would prefer to see the existing methods in {{DataNodeMetrics}}
changed to update both metrics, rather than the caller having to remember to
call both methods. It introduces less possibility for the two metrics to get
out of sync later.
* I'm not sure if you should re-use the same {{MutableRatesWithAggregation}}
for all of the metrics. It seems cleaner to me to have one per metric type,
e.g. one for heartbeats, one for lifeline, and so on, but let me know if you
disagree. I think this may even make it so that, if you set up the names
correctly, the {{MutableRatesWithAggregation}} can replace the existing
{{MutableRate}} while maintaining the name of the metric. Not 100% sure on this.
* You should update {{Metrics.md}} documenting these new metrics
> Use different metrics in DataNode to better measure latency of
> heartbeat/blockReports/incrementalBlockReports of Active/Standby NN
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-14045
> URL: https://issues.apache.org/jira/browse/HDFS-14045
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Reporter: Jiandan Yang
> Assignee: Jiandan Yang
> Priority: Major
> Attachments: HDFS-14045.001.patch, HDFS-14045.002.patch,
> HDFS-14045.003.patch, HDFS-14045.004.patch, HDFS-14045.005.patch,
> HDFS-14045.006.patch, HDFS-14045.007.patch
>
>
> Currently DataNode uses same metrics to measure rpc latency of NameNode, but
> Active and Standby usually have different performance at the same time,
> especially in large cluster. For example, rpc latency of Standby is very long
> when Standby is catching up editlog. We may misunderstand the state of HDFS.
> Using different metrics for Active and standby can help us obtain more
> precise metric data.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]