[ 
https://issues.apache.org/jira/browse/HDFS-14045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682395#comment-16682395
 ] 

Jiandan Yang  commented on HDFS-14045:
--------------------------------------

Thanks [~xkrogen] for your comments very much.
{quote}
Can we change the same of the method/parameter to something indicating it is 
for metrics only, maybe like nnLatencyMetricsSuffix? It looks particularly odd 
to me in IncrementalBlockReportManager right now.
{quote}
I rename {{nnLatencyMetricsSuffix}} into {{rpcMetricSuffix}},  what do you 
think of this name?
{quote}
I think I would prefer to see the existing methods in DataNodeMetrics changed 
to update both metrics, rather than the caller having to remember to call both 
methods. It introduces less possibility for the two metrics to get out of sync 
later.
{quote}
Very good suggestion, I have changed to update both metrics at one method in 
patch008, but serviceId-nnId is needed when updating metric, so there is need 
to add a parameter as suffix of metrics in the existing methods.
{quote}
I'm not sure if you should re-use the same MutableRatesWithAggregation for all 
of the metrics. It seems cleaner to me to have one per metric type, e.g. one 
for heartbeats, one for lifeline, and so on, but let me know if you disagree. I 
think this may even make it so that, if you set up the names correctly, the 
MutableRatesWithAggregation can replace the existing MutableRate while 
maintaining the name of the metric. Not 100% sure on this.
{quote}
I prefer to re-use MutableRatesWithAggregation for simplicity, it does not need 
to add fields when adding new metrics.
{quote}
You should update Metrics.md documenting these new metrics
{quote}
Thanks for reminding to modify Metrics.md,  and newly added metrics have been 
written to Metrics.md in patch008

> Use different metrics in DataNode to better measure latency of 
> heartbeat/blockReports/incrementalBlockReports of Active/Standby NN
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-14045
>                 URL: https://issues.apache.org/jira/browse/HDFS-14045
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Jiandan Yang 
>            Assignee: Jiandan Yang 
>            Priority: Major
>         Attachments: HDFS-14045.001.patch, HDFS-14045.002.patch, 
> HDFS-14045.003.patch, HDFS-14045.004.patch, HDFS-14045.005.patch, 
> HDFS-14045.006.patch, HDFS-14045.007.patch
>
>
> Currently DataNode uses same metrics to measure rpc latency of NameNode, but 
> Active and Standby usually have different performance at the same time, 
> especially in large cluster. For example, rpc latency of Standby is very long 
> when Standby is catching up editlog. We may misunderstand the state of HDFS. 
> Using different metrics for Active and standby can help us obtain more 
> precise metric data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to