Gyula Fora created FLINK-27820:
----------------------------------
Summary: Handle Upgrade/Deployment errors gracefully
Key: FLINK-27820
URL: https://issues.apache.org/jira/browse/FLINK-27820
Project: Flink
Issue Type: Improvement
Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.0.0
Reporter: Gyula Fora
Assignee: Gyula Fora
Fix For: kubernetes-operator-1.1.0
The operator currently cannot gracefully handle the cases when there is a
failure during (or directly after & and before updating the status) job
submission.
This applies to both initial cluster submissions when a Flink CR was created
but more importantly during upgrades.
This is slightly related to https://issues.apache.org/jira/browse/FLINK-27804
where mid-upgrade observe was disabled to workaround some issues, this logic
should also be improved to only skip observing last-state info for already
finished jobs (that were observed before).
During upgrades, the observer should be able to recognize when the job/cluster
was actually submitted already even if the status update subsequently failed
and move the status into a healthy DEPLOYED state.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)