[ 
https://issues.apache.org/jira/browse/FLINK-27675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-27675:
-----------------------------------
    Labels: pull-request-available  (was: )

> Improve manual savepoint tracking
> ---------------------------------
>
>                 Key: FLINK-27675
>                 URL: https://issues.apache.org/jira/browse/FLINK-27675
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kubernetes Operator
>            Reporter: Gyula Fora
>            Assignee: Matyas Orhidi
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: kubernetes-operator-1.0.0
>
>
> There are 2 problems with the manual savpeoint result observing logic that 
> can cause the reconciler to not make progress with the deployment 
> (recoveries, upgrades etc).
>  # Whenever the jobmanager deployment is not in READY state or the job itself 
> is not RUNNING, the trigger info must be reset and we should not try to query 
> it anymore. Flink will not retry the savepoint if the job fails, restarted 
> anyways.
>  # If there is a sensible error when fetching the savepoint status (such as: 
> There is no savepoint operation with triggerId=xxx for job ) we should simply 
> reset the trigger. These errors will never go away on their own and will 
> simply cause the deployment to get stuck in observing/waiting for a savepoint 
> to complete



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to