[ 
https://issues.apache.org/jira/browse/FLINK-30287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Zhang updated FLINK-30287:
---------------------------------
    Description: 
I started up a Standalone session Flink cluster and ran one job on it. I 
checked the configMaps and see the HA data.

 
{code:java}
kubectl get configmaps -n flink-operator
NAME                                                                            
DATA   AGE
flink-config-sql-example-deployment-s3-testing                                  
2      5m41s
flink-operator-config                                                           
3      42h
kube-root-ca.crt                                                                
1      42h
sql-example-deployment-s3-testing-000000003f57cd5f0000000000000002-config-map   
0      11m
sql-example-deploymnt-s3-testing-cluster-config-map                             
5      12m
{code}
 

I then update the FlinkDep image field and the Flink cluster gets restarted. 
The HA configmap for the job is now gone.
{code:java}
kubectl get configmaps -n flink-operator
NAME                                                   DATA   AGE
flink-config-sql-example-deployment-s3-testing         2      18m
flink-operator-config                                  3      43h
kube-root-ca.crt                                       1      43h
sql-example-deployment-s3-testing-cluster-config-map   3      31m {code}
 

I think this is due to a race condition where the TM first terminates which 
causes the JM to interpret the Job entering a failed state which causes it to 
clean up the configmaps.

  was:
I started up a Standalone session Flink cluster and ran one job on it. I 
checked the configMaps and see the HA data.

 

{code:java}

kubectl get configmaps -n cc-flink-operator

NAME                                                                            
DATA   AGE

flink-config-sql-example-deployment-s3-testing                                  
2      5m41s

flink-operator-config                                                           
3      42h

kube-root-ca.crt                                                                
1      42h

sql-example-deployment-s3-testing-000000003f57cd5f0000000000000002-config-map   
0      11m

sql-example-deployment-s3-testing-cluster-config-map                            
5      12m

{code}

 

I then update the FlinkDep image field and the Flink cluster gets restarted. 
The HA configmap for the job is now gone.

 

```

kubectl get configmaps -n cc-flink-operator

– cc-k8s-auth-helper.sh is owned by @confluentinc/kpt

NAME                                                   DATA   AGE

flink-config-sql-example-deployment-s3-testing         2      18m

flink-operator-config                                  3      43h

kube-root-ca.crt                                       1      43h

sql-example-deployment-s3-testing-cluster-config-map   3      31m

```

 

I think this is due to a race condition where the TM first terminates which 
causes the JM to interpret the Job entering a failed state which causes it to 
clean up the configmaps.


> Configmaps get cleaned up when upgrading standalone Flink cluster
> -----------------------------------------------------------------
>
>                 Key: FLINK-30287
>                 URL: https://issues.apache.org/jira/browse/FLINK-30287
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>    Affects Versions: 1.2
>            Reporter: Steven Zhang
>            Priority: Major
>
> I started up a Standalone session Flink cluster and ran one job on it. I 
> checked the configMaps and see the HA data.
>  
> {code:java}
> kubectl get configmaps -n flink-operator
> NAME                                                                          
>   DATA   AGE
> flink-config-sql-example-deployment-s3-testing                                
>   2      5m41s
> flink-operator-config                                                         
>   3      42h
> kube-root-ca.crt                                                              
>   1      42h
> sql-example-deployment-s3-testing-000000003f57cd5f0000000000000002-config-map 
>   0      11m
> sql-example-deploymnt-s3-testing-cluster-config-map                           
>   5      12m
> {code}
>  
> I then update the FlinkDep image field and the Flink cluster gets restarted. 
> The HA configmap for the job is now gone.
> {code:java}
> kubectl get configmaps -n flink-operator
> NAME                                                   DATA   AGE
> flink-config-sql-example-deployment-s3-testing         2      18m
> flink-operator-config                                  3      43h
> kube-root-ca.crt                                       1      43h
> sql-example-deployment-s3-testing-cluster-config-map   3      31m {code}
>  
> I think this is due to a race condition where the TM first terminates which 
> causes the JM to interpret the Job entering a failed state which causes it to 
> clean up the configmaps.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to