[ 
https://issues.apache.org/jira/browse/FLINK-26261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495405#comment-17495405
 ] 

Yang Wang edited comment on FLINK-26261 at 2/21/22, 9:45 AM:
-------------------------------------------------------------

Maybe we should verify whether the JobManager pod status is running before 
building a Flink rest client to get job status.

If the JobManager pod could not be launched in a given timeout(e.g. 600s), I 
think it is reasonable we could suspend the job the forward the pod events to 
FlinkDeployment.


was (Author: fly_in_gis):
Maybe we should verify whether the JobManager pod status is running before 
building a Flink rest client to get job status.

If the JobManager pod could not be launched in a given timeout(e.g. 600s), then 
we could suspend the job the forward the pod events to FlinkDeployment.

> Reconciliation should try to start job when not already started or move to 
> permanent error
> ------------------------------------------------------------------------------------------
>
>                 Key: FLINK-26261
>                 URL: https://issues.apache.org/jira/browse/FLINK-26261
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Kubernetes Operator
>            Reporter: Thomas Weise
>            Priority: Major
>
> When job submission fails, the operator currently keeps trying to find the 
> job status. In the case I'm looking at the cluster wasn't created because the 
> image could not be resolved. We either need the logic to re-attempt job 
> submission or flag the submission as failed so that JobStatusObserver does 
> not attempt to check again. We should also capture the submission error as 
> event on the CR.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to