Gyula Fora created FLINK-27820:
----------------------------------

             Summary: Handle Upgrade/Deployment errors gracefully
                 Key: FLINK-27820
                 URL: https://issues.apache.org/jira/browse/FLINK-27820
             Project: Flink
          Issue Type: Improvement
          Components: Kubernetes Operator
    Affects Versions: kubernetes-operator-1.0.0
            Reporter: Gyula Fora
            Assignee: Gyula Fora
             Fix For: kubernetes-operator-1.1.0


The operator currently cannot gracefully handle the cases when there is a 
failure during (or directly after & and before updating the status) job 
submission.

This applies to both initial cluster submissions when a Flink CR was created 
but more importantly during upgrades.

This is slightly related to https://issues.apache.org/jira/browse/FLINK-27804 
where mid-upgrade observe was disabled to workaround some issues, this logic 
should also be improved to only skip observing last-state info for already 
finished jobs (that were observed before).

During upgrades, the observer should be able to recognize when the job/cluster 
was actually submitted already even if the status update subsequently failed 
and move the status into a healthy DEPLOYED state.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to