[
https://issues.apache.org/jira/browse/FLINK-13787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911995#comment-16911995
]
Kaibo Zhou commented on FLINK-13787:
------------------------------------
hi, [~lamber-ken]
Thank you for your reminder, this information is very useful to me. An external
schedule system adds complexity to the overall system. I will consider a
prometheus pushgateway version that supports the TTL feature, such as [1] or
[2].
In addition, even if the flink community is considering fixing this issue,
there is no guarantee that the metrics on the pushgateway will be completely
removed. Because the TM may exit due to various exceptions, there is no way to
call the close method, or the network is unreachable. For the deletion of
metrics on push gateway, it is a bit complicated to consider both standalone
and session cluster.
[1] [https://github.com/dinumathai/pushgateway]
[2] [https://github.com/pkcakeout/pushgateway]
> PrometheusPushGatewayReporter does not cleanup TM metrics when run on
> kubernetes
> --------------------------------------------------------------------------------
>
> Key: FLINK-13787
> URL: https://issues.apache.org/jira/browse/FLINK-13787
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Metrics
> Affects Versions: 1.7.2, 1.8.1, 1.9.0
> Reporter: Kaibo Zhou
> Priority: Major
>
> I have run a flink job on kubernetes and use PrometheusPushGatewayReporter, I
> can see the metrics from the flink jobmanager and taskmanager from the push
> gateway's UI.
> When I cancel the job, I found the jobmanager's metrics disappear, but the
> taskmanager's metrics still exist, even though I have set the
> _deleteOnShutdown_ to true_._
> The configuration is:
> {code:java}
> metrics.reporters: "prom"
> metrics.reporter.prom.class:
> "org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter"
> metrics.reporter.prom.jobName: "WordCount"
> metrics.reporter.prom.host: "localhost"
> metrics.reporter.prom.port: "9091"
> metrics.reporter.prom.randomJobNameSuffix: "true"
> metrics.reporter.prom.filterLabelValueCharacters: "true"
> metrics.reporter.prom.deleteOnShutdown: "true"
> {code}
>
> Other people have also encountered this problem:
> [https://stackoverflow.com/questions/54420498/flink-prometheus-push-gateway-reporter-delete-metrics-on-job-shutdown].
> And another similar issue: FLINK-11457.
>
> As prometheus is a very import metrics system on kubernetes, if we can solve
> this problem, it is beneficial for users to monitor their flink jobs.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)