[jira] [Updated] (FLINK-11457) PrometheusPushGatewayReporter does not cleanup its metrics

Oscar Westra van Holthe - Kind (JIRA) Fri, 01 Feb 2019 00:29:04 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-11457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Oscar Westra van Holthe - Kind updated FLINK-11457:
---------------------------------------------------
    Description: 
When cancelling a job running on a yarn based cluster and then shutting down 
the cluster, metrics on the push gateway are not deleted.

 

 


 

Any thoughts on a solution? I'm happy to implement it, but Im not sure what the 
best solution would be.

  was:
When using the PrometheusPushGatewayReporter, one has two options:
 * Use a fixed job name, which causes the jobmanager and taskmanager to 
overwrite each others metrics (i.e. last write wins, and you lose a lot of 
metrics)
 * Use a random suffix for the job name, which creates a lot of labels that 
have to be cleaned up manually

The manual cleanup should not be necessary, but happens nonetheless when using 
a yarn cluster.

A fix could be to add a suffix the job name, naming the nodes in a non-random 
manner like: {{myjob_jm0}}, {{my_job_tm1}}, {{my_job_tm1}}, {{my_job_tm2}}, 
{{my_job_tm3}}, {{my_job_tm4}}, ..., using a counter (not sure if such is 
available), or some other stable (!) suffix.

Related discussion: FLINK-9187

 

Any thoughts on a solution? I'm happy to implement it, but Im not sure what the 
best solution would be.


> PrometheusPushGatewayReporter does not cleanup its metrics
> ----------------------------------------------------------
>
>                 Key: FLINK-11457
>                 URL: https://issues.apache.org/jira/browse/FLINK-11457
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Oscar Westra van Holthe - Kind
>            Priority: Major
>
> When cancelling a job running on a yarn based cluster and then shutting down 
> the cluster, metrics on the push gateway are not deleted.
>  
>  
>  
> Any thoughts on a solution? I'm happy to implement it, but Im not sure what 
> the best solution would be.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-11457) PrometheusPushGatewayReporter does not cleanup its metrics

Reply via email to