[ https://issues.apache.org/jira/browse/FLINK-28478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aitozi updated FLINK-28478: --------------------------- Description: When I test case with https://issues.apache.org/jira/browse/FLINK-28187 I hit that the session cluster deploy can not be deployed if it fails between status recorded and deploy. Because, in the next reconcile loop, the spec is not detected changed by {{checkNewSpecAlreadyDeployed}}, so it will not try to start the session cluster again. The application mode have no problem, because the deployed spec SUSPEND state of the job is not equal to the desired state, so it will try to reconcile the spec change. was: When I test case with https://issues.apache.org/jira/browse/FLINK-28187 I hit that the session cluster deploy can not recover if it fails between status recorded and deploy. Because, in the next reconcile loop, the spec is not detected changed by {{checkNewSpecAlreadyDeployed}}, so it will not try to start the session cluster again. The application mode have no problem, because the deployed spec SUSPEND state of the job is not equal to the desired state, so it will try to reconcile the spec change. > Session Cluster will lost if it failed between status recorded and deploy > ------------------------------------------------------------------------- > > Key: FLINK-28478 > URL: https://issues.apache.org/jira/browse/FLINK-28478 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator > Reporter: Aitozi > Priority: Major > > When I test case with https://issues.apache.org/jira/browse/FLINK-28187 > I hit that the session cluster deploy can not be deployed if it fails between > status recorded and deploy. Because, in the next reconcile loop, the spec is > not detected changed by {{checkNewSpecAlreadyDeployed}}, so it will not try > to start the session cluster again. > The application mode have no problem, because the deployed spec SUSPEND state > of the job is not equal to the desired state, so it will try to reconcile the > spec change. -- This message was sent by Atlassian Jira (v8.20.10#820010)