[ 
https://issues.apache.org/jira/browse/FLINK-26139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498139#comment-17498139
 ] 

Biao Geng commented on FLINK-26139:
-----------------------------------

After checking the job state transition in Flink, I think in this operator, for 
our flink application, we may only need to process 3 kinds of states: 
{{{}RUNNING{}}}, {{SUSPENDED}} and {{{}TERMINATED{}}}.

{{{}RUNNING{}}}: The application cluster is running. The actual job status 
could be any of non_terminal state(i.e. INITIALIZING, CREATED, RUNNING, 
FAILING, CANCELLING, RESTARTING or RECONCILING). JM is responsible for managing 
the internal state transistion and the operator just consider it as running.
{{{}TERMINATED{}}}: The application is finished due to the completion, failure 
or cancellation of the job. In this state, the operator should clean up the 
deployment and all other resources. No potential upgrades any more.
{{{}SUSPENDED{}}}: From the API doc: "The job has been suspended which means 
that it has been stopped but not been removed from a potential HA job store." 
There could be potential upgrades happened in this state, like modifying the 
parallism and then the state can be tranferred to {{{}RUNNING{}}}.

The state machine can be as simple as the following picture:

!image-2022-02-25-21-22-08-636.png|width=200,height=119!

I am a little uncertained that if we should split the {{Terminal}} state into 
more specific states like FAILED, FINISHED and CANCELLED so that we can track 
the status better in k8s side.

Does above design make sense? cc [~wangyang0918] [~gyfora] 

> Improve JobStatus tracking and handle different job states
> ----------------------------------------------------------
>
>                 Key: FLINK-26139
>                 URL: https://issues.apache.org/jira/browse/FLINK-26139
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Kubernetes Operator
>            Reporter: Gyula Fora
>            Priority: Major
>         Attachments: image-2022-02-25-21-22-08-636.png
>
>
> Currently we do not handle any job status changes such as cancellations, 
> errors or job completions.
> We should introduce some mechanism to react and deal with these changes and 
> expose them in the status as they can potentially affect upgrades.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to