[ 
https://issues.apache.org/jira/browse/FLINK-28478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-28478:
---------------------------
    Description: 
When I test case with https://issues.apache.org/jira/browse/FLINK-28187 
I hit that the session cluster deploy can not be deployed if it fails between 
status recorded and deploy. Because, in the next reconcile loop, the spec is 
not detected changed by {{checkNewSpecAlreadyDeployed}}, so it will not try to 
start the session cluster again. 

The application mode have no problem, because the deployed spec SUSPEND state 
of the job is not equal to the desired state, so it will try to reconcile the 
spec change.

  was:
When I test case with https://issues.apache.org/jira/browse/FLINK-28187 
I hit that the session cluster deploy can not recover if it fails between 
status recorded and deploy. Because, in the next reconcile loop, the spec is 
not detected changed by {{checkNewSpecAlreadyDeployed}}, so it will not try to 
start the session cluster again. 

The application mode have no problem, because the deployed spec SUSPEND state 
of the job is not equal to the desired state, so it will try to reconcile the 
spec change.


> Session Cluster will lost if it failed between status recorded and deploy
> -------------------------------------------------------------------------
>
>                 Key: FLINK-28478
>                 URL: https://issues.apache.org/jira/browse/FLINK-28478
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>            Reporter: Aitozi
>            Priority: Major
>
> When I test case with https://issues.apache.org/jira/browse/FLINK-28187 
> I hit that the session cluster deploy can not be deployed if it fails between 
> status recorded and deploy. Because, in the next reconcile loop, the spec is 
> not detected changed by {{checkNewSpecAlreadyDeployed}}, so it will not try 
> to start the session cluster again. 
> The application mode have no problem, because the deployed spec SUSPEND state 
> of the job is not equal to the desired state, so it will try to reconcile the 
> spec change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to