[ 
https://issues.apache.org/jira/browse/FLINK-30287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Zhang updated FLINK-30287:
---------------------------------
    Description: 
I started up a Standalone session Flink cluster and ran one job on it. I 
checked the configMaps and see the HA data.

 

{code:java}

kubectl get configmaps -n cc-flink-operator

NAME                                                                            
DATA   AGE

flink-config-sql-example-deployment-s3-testing                                  
2      5m41s

flink-operator-config                                                           
3      42h

kube-root-ca.crt                                                                
1      42h

sql-example-deployment-s3-testing-000000003f57cd5f0000000000000002-config-map   
0      11m

sql-example-deployment-s3-testing-cluster-config-map                            
5      12m

{code}

 

I then update the FlinkDep image field and the Flink cluster gets restarted. 
The HA configmap for the job is now gone.

 

```

kubectl get configmaps -n cc-flink-operator

– cc-k8s-auth-helper.sh is owned by @confluentinc/kpt

NAME                                                   DATA   AGE

flink-config-sql-example-deployment-s3-testing         2      18m

flink-operator-config                                  3      43h

kube-root-ca.crt                                       1      43h

sql-example-deployment-s3-testing-cluster-config-map   3      31m

```

 

I think this is due to a race condition where the TM first terminates which 
causes the JM to interpret the Job entering a failed state which causes it to 
clean up the configmaps.

  was:
I started up a Standalone session Flink cluster and ran one job on it. I 
checked the configMaps and see the HA data.
{panel}
 {panel}
kubectl get configmaps -n cc-flink-operator

NAME                                                                            
DATA   AGE

flink-config-sql-example-deployment-s3-testing                                  
2      5m41s

flink-operator-config                                                           
3      42h

kube-root-ca.crt                                                                
1      42h

sql-example-deployment-s3-testing-000000003f57cd5f0000000000000002-config-map   
0      11m

sql-example-deployment-s3-testing-cluster-config-map                            
5      12m

```

I then update the FlinkDep image field and the Flink cluster gets restarted. 
The HA configmap for the job is now gone.

 

```

kubectl get configmaps -n cc-flink-operator

– cc-k8s-auth-helper.sh is owned by @confluentinc/kpt

NAME                                                   DATA   AGE

flink-config-sql-example-deployment-s3-testing         2      18m

flink-operator-config                                  3      43h

kube-root-ca.crt                                       1      43h

sql-example-deployment-s3-testing-cluster-config-map   3      31m

```

 

I think this is due to a race condition where the TM first terminates which 
causes the JM to interpret the Job entering a failed state which causes it to 
clean up the configmaps.


> Configmaps get cleaned up when upgrading standalone Flink cluster
> -----------------------------------------------------------------
>
>                 Key: FLINK-30287
>                 URL: https://issues.apache.org/jira/browse/FLINK-30287
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>    Affects Versions: 1.2
>            Reporter: Steven Zhang
>            Priority: Major
>
> I started up a Standalone session Flink cluster and ran one job on it. I 
> checked the configMaps and see the HA data.
>  
> {code:java}
> kubectl get configmaps -n cc-flink-operator
> NAME                                                                          
>   DATA   AGE
> flink-config-sql-example-deployment-s3-testing                                
>   2      5m41s
> flink-operator-config                                                         
>   3      42h
> kube-root-ca.crt                                                              
>   1      42h
> sql-example-deployment-s3-testing-000000003f57cd5f0000000000000002-config-map 
>   0      11m
> sql-example-deployment-s3-testing-cluster-config-map                          
>   5      12m
> {code}
>  
> I then update the FlinkDep image field and the Flink cluster gets restarted. 
> The HA configmap for the job is now gone.
>  
> ```
> kubectl get configmaps -n cc-flink-operator
> – cc-k8s-auth-helper.sh is owned by @confluentinc/kpt
> NAME                                                   DATA   AGE
> flink-config-sql-example-deployment-s3-testing         2      18m
> flink-operator-config                                  3      43h
> kube-root-ca.crt                                       1      43h
> sql-example-deployment-s3-testing-cluster-config-map   3      31m
> ```
>  
> I think this is due to a race condition where the TM first terminates which 
> causes the JM to interpret the Job entering a failed state which causes it to 
> clean up the configmaps.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to