[ 
https://issues.apache.org/jira/browse/FLINK-13787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaibo Zhou updated FLINK-13787:
-------------------------------
    Description: 
I have run a flink job on kubernetes and use PrometheusPushGatewayReporter, I 
can see the metrics from the flink jobmanager and taskmanager from the push 
gateway's UI.

When I cancel the job, I found the jobmanager's metrics disappear, but the 
taskmanager's metrics still exist, even though I have set the 
_deleteOnShutdown_ to true_._

The configuration is:
{code:java}
metrics.reporters: "prom"
metrics.reporter.prom.class: 
"org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter"
metrics.reporter.prom.jobName: "WordCount"
metrics.reporter.prom.host: "localhost"
metrics.reporter.prom.port: "9091"
metrics.reporter.prom.randomJobNameSuffix: "true"
metrics.reporter.prom.filterLabelValueCharacters: "true"
metrics.reporter.prom.deleteOnShutdown: "true"
{code}
 

Other people have also encountered this problem: 
[https://stackoverflow.com/questions/54420498/flink-prometheus-push-gateway-reporter-delete-metrics-on-job-shutdown].
  And another similar issue: FLINK-11457.

 

As prometheus is a very import metrics system on kubernetes, if we can solve 
this problem, it is beneficial for users to monitor their flink jobs.

  was:
I have run a flink job on kubernetes and use PrometheusPushGatewayReporter, I 
can see the metrics from the flink jobmanager and taskmanager from the push 
gateway's UI.

When I cancel the job, I found the jobmanager's metrics disappear, but the 
taskmanager's metrics still exist, even though I have set the 
_deleteOnShutdown_ to true_._

The configuration is:
{code:java}
metrics.reporters: "prom"
metrics.reporter.prom.class: 
"org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter"
metrics.reporter.prom.jobName: "WordCount"
metrics.reporter.prom.host: "localhost"
metrics.reporter.prom.port: "9091"
metrics.reporter.prom.randomJobNameSuffix: "true"
metrics.reporter.prom.filterLabelValueCharacters: "true"
metrics.reporter.prom.deleteOnShutdown: "true"
{code}
 

Other people have also encountered this problem: 
[link|[https://stackoverflow.com/questions/54420498/flink-prometheus-push-gateway-reporter-delete-metrics-on-job-shutdown]].

And another similar issue: 
[FLINK-11457|https://issues.apache.org/jira/browse/FLINK-11457].

 

As prometheus is a very import metrics system on kubernetes, if we can solve 
this problem, it is beneficial for users to monitor their flink jobs.


> PrometheusPushGatewayReporter does not cleanup TM metrics when run on 
> kubernetes
> --------------------------------------------------------------------------------
>
>                 Key: FLINK-13787
>                 URL: https://issues.apache.org/jira/browse/FLINK-13787
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Metrics
>    Affects Versions: 1.7.2, 1.8.1, 1.9.0
>            Reporter: Kaibo Zhou
>            Priority: Major
>
> I have run a flink job on kubernetes and use PrometheusPushGatewayReporter, I 
> can see the metrics from the flink jobmanager and taskmanager from the push 
> gateway's UI.
> When I cancel the job, I found the jobmanager's metrics disappear, but the 
> taskmanager's metrics still exist, even though I have set the 
> _deleteOnShutdown_ to true_._
> The configuration is:
> {code:java}
> metrics.reporters: "prom"
> metrics.reporter.prom.class: 
> "org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter"
> metrics.reporter.prom.jobName: "WordCount"
> metrics.reporter.prom.host: "localhost"
> metrics.reporter.prom.port: "9091"
> metrics.reporter.prom.randomJobNameSuffix: "true"
> metrics.reporter.prom.filterLabelValueCharacters: "true"
> metrics.reporter.prom.deleteOnShutdown: "true"
> {code}
>  
> Other people have also encountered this problem: 
> [https://stackoverflow.com/questions/54420498/flink-prometheus-push-gateway-reporter-delete-metrics-on-job-shutdown].
>   And another similar issue: FLINK-11457.
>  
> As prometheus is a very import metrics system on kubernetes, if we can solve 
> this problem, it is beneficial for users to monitor their flink jobs.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to