[
https://issues.apache.org/jira/browse/FLINK-11457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Oscar Westra van Holthe - Kind updated FLINK-11457:
---------------------------------------------------
Summary: PrometheusPushGatewayReporter does not cleanup its metrics (was:
PrometheusPushGatewayReporter either overwrites its own metrics or creates too
may labels)
> PrometheusPushGatewayReporter does not cleanup its metrics
> ----------------------------------------------------------
>
> Key: FLINK-11457
> URL: https://issues.apache.org/jira/browse/FLINK-11457
> Project: Flink
> Issue Type: Bug
> Reporter: Oscar Westra van Holthe - Kind
> Priority: Major
>
> When using the PrometheusPushGatewayReporter, one has two options:
> * Use a fixed job name, which causes the jobmanager and taskmanager to
> overwrite each others metrics (i.e. last write wins, and you lose a lot of
> metrics)
> * Use a random suffix for the job name, which creates a lot of labels that
> have to be cleaned up manually
> The manual cleanup should not be necessary, but happens nonetheless when
> using a yarn cluster.
> A fix could be to add a suffix the job name, naming the nodes in a non-random
> manner like: {{myjob_jm0}}, {{my_job_tm1}}, {{my_job_tm1}}, {{my_job_tm2}},
> {{my_job_tm3}}, {{my_job_tm4}}, ..., using a counter (not sure if such is
> available), or some other stable (!) suffix.
> Related discussion: FLINK-9187
>
> Any thoughts on a solution? I'm happy to implement it, but Im not sure what
> the best solution would be.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)