Standby NN

Erik Krogen (JIRA) Fri, 09 Nov 2018 07:30:26 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-14045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16681592#comment-16681592
 ]


Erik Krogen commented on HDFS-14045:
------------------------------------

Cool, the new changes LGTM. A few additional comments:

* Can we change the same of the method/parameter to something indicating it is 
for metrics only, maybe like {{nnLatencyMetricsSuffix}}? It looks particularly 
odd to me in {{IncrementalBlockReportManager}} right now.
* I think I would prefer to see the existing methods in {{DataNodeMetrics}} 
changed to update both metrics, rather than the caller having to remember to 
call both methods. It introduces less possibility for the two metrics to get 
out of sync later.
* I'm not sure if you should re-use the same {{MutableRatesWithAggregation}} 
for all of the metrics. It seems cleaner to me to have one per metric type, 
e.g. one for heartbeats, one for lifeline, and so on, but let me know if you 
disagree. I think this may even make it so that, if you set up the names 
correctly, the {{MutableRatesWithAggregation}} can replace the existing 
{{MutableRate}} while maintaining the name of the metric. Not 100% sure on this.
* You should update {{Metrics.md}} documenting these new metrics

> Use different metrics in DataNode to better measure latency of 
> heartbeat/blockReports/incrementalBlockReports of Active/Standby NN
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-14045
>                 URL: https://issues.apache.org/jira/browse/HDFS-14045
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Jiandan Yang 
>            Assignee: Jiandan Yang 
>            Priority: Major
>         Attachments: HDFS-14045.001.patch, HDFS-14045.002.patch, 
> HDFS-14045.003.patch, HDFS-14045.004.patch, HDFS-14045.005.patch, 
> HDFS-14045.006.patch, HDFS-14045.007.patch
>
>
> Currently DataNode uses same metrics to measure rpc latency of NameNode, but 
> Active and Standby usually have different performance at the same time, 
> especially in large cluster. For example, rpc latency of Standby is very long 
> when Standby is catching up editlog. We may misunderstand the state of HDFS. 
> Using different metrics for Active and standby can help us obtain more 
> precise metric data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-14045) Use different metrics in DataNode to better measure latency of heartbeat/blockReports/incrementalBlockReports of Active/Standby NN

Reply via email to