[
https://issues.apache.org/jira/browse/FLINK-26139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498596#comment-17498596
]
Gyula Fora commented on FLINK-26139:
------------------------------------
CANCELLED: I think the only question here is what to do with manual user
cancellation of a job. This will mostly manifest itself as a missing deployment
(as the cluster self destructs currently - might change in the future). This
will also remove checkpoint metadata so I think it is very tricky to restore a
cancelled job in a general case while respecting the upgrade policy. The more I
look at it the more I think this should be an terminal ERROR state (no upgrade
possible).
FINISHED: In theory jobs can actually finish (bounded sources). The problem
might be that we cannot really tell that a job finished or cancelled because
same as with cancelled, the cluster will self destruct, HA metadata will be
deleted etc.
So to recap, since terminal job states will shut down the cluster and delete HA
data, we don't really have any choice now other than move to ERROR or delete
the resource or introduce an UNKNOWN state. Later if we force the cluster to
stay alive (https://issues.apache.org/jira/browse/FLINK-24113) we need to
revisit this.
> Improve JobStatus tracking and handle different job states
> ----------------------------------------------------------
>
> Key: FLINK-26139
> URL: https://issues.apache.org/jira/browse/FLINK-26139
> Project: Flink
> Issue Type: Sub-task
> Components: Kubernetes Operator
> Reporter: Gyula Fora
> Priority: Major
> Attachments: image-2022-02-25-21-22-08-636.png
>
>
> Currently we do not handle any job status changes such as cancellations,
> errors or job completions.
> We should introduce some mechanism to react and deal with these changes and
> expose them in the status as they can potentially affect upgrades.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)