[jira] [Commented] (FLINK-32170) Continue metric collection on intermittant job restarts

Maximilian Michels (Jira) Wed, 24 May 2023 06:00:09 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-32170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725789#comment-17725789
 ]


Maximilian Michels commented on FLINK-32170:
--------------------------------------------

Yes, this is the prerequisite. If we kept an in-memory copy of the job topology 
after the job leaves the RUNNING phase, it should be easy to assert this.

> Continue metric collection on intermittant job restarts
> -------------------------------------------------------
>
>                 Key: FLINK-32170
>                 URL: https://issues.apache.org/jira/browse/FLINK-32170
>             Project: Flink
>          Issue Type: Improvement
>          Components: Autoscaler, Kubernetes Operator
>            Reporter: Maximilian Michels
>            Priority: Major
>
> If the underlying infrastructure is not stable, e.g. Kubernetes pod eviction, 
> the jobs will sometimes restart. This will restart the metric collection 
> process for the autoscaler and discard any existing metrics. If the 
> interruption time is short, e.g. less than one minute, we could consider 
> resuming metric collection after the job goes back into RUNNING state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-32170) Continue metric collection on intermittant job restarts

Reply via email to