[ 
https://issues.apache.org/jira/browse/FLINK-13732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907856#comment-16907856
 ] 

Xiaogang Shi commented on FLINK-13732:
--------------------------------------

[~SleePy] Thanks for bringing up this issue. 

We are also suffering from confusing "job manager metrics". It will be nice if 
we can sperate legacy {{JobManagerMetricGroup}} into {{DispatcherMetricGroup}}, 
{{ResourceManagerMetricGroup}}, and {{JobManagerMetricGroup}}, and distinguish 
them with cluster ids.

But an interesting question here is the collection of process metrics (e.g., 
cpu, memory, i/o, and threads). Currently, it's not a problem as Flink now does 
not collect any process metrics. But from our experience, these process metrics 
are very helpful in monitoring and troubleshooting. 

Definitely, it's another question whether we should collect process metrics. 
But in case we do, it will be a question in which metric group we collect 
process metrics of job managers. 

> Enhance JobManagerMetricGroup with FLIP-6 architecture
> ------------------------------------------------------
>
>                 Key: FLINK-13732
>                 URL: https://issues.apache.org/jira/browse/FLINK-13732
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Metrics
>            Reporter: Biao Liu
>            Priority: Major
>             Fix For: 1.10.0
>
>
> There is a requirement from user mailing list [1]. I think it's reasonable 
> enough to support.
> The scenario is that when deploying a Flink cluster on Yarn, there might be 
> several {{JM(RM)}} s running on the same host. IMO that's quite a general 
> scenario. However we can't distinguish the metrics from different 
> {{JobManagerMetricGroup}}, because there is only one variable "hostname" we 
> can use.
> I think there are some problems of current implementation of 
> {{JobManagerMetricGroup}}. It's still non-FLIP-6 style. We should split the 
> metric group into {{RM}} and {{Dispatcher}} to match the FLIP-6 architecture. 
> And there should be an identification variable supported, just like {{tm_id}}.
> CC [~StephanEwen], [~till.rohrmann], [~Zentol]
> 1. 
> [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-metrics-scope-for-YARN-single-job-td29389.html]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to