[
https://issues.apache.org/jira/browse/FLINK-30287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steven Zhang updated FLINK-30287:
---------------------------------
Description:
I started up a Standalone session Flink cluster and ran one job on it. I
checked the configMaps and see the HA data.
{panel}
{panel}
kubectl get configmaps -n cc-flink-operator
NAME
DATA AGE
flink-config-sql-example-deployment-s3-testing
2 5m41s
flink-operator-config
3 42h
kube-root-ca.crt
1 42h
sql-example-deployment-s3-testing-000000003f57cd5f0000000000000002-config-map
0 11m
sql-example-deployment-s3-testing-cluster-config-map
5 12m
```
I then update the FlinkDep image field and the Flink cluster gets restarted.
The HA configmap for the job is now gone.
```
kubectl get configmaps -n cc-flink-operator
– cc-k8s-auth-helper.sh is owned by @confluentinc/kpt
NAME DATA AGE
flink-config-sql-example-deployment-s3-testing 2 18m
flink-operator-config 3 43h
kube-root-ca.crt 1 43h
sql-example-deployment-s3-testing-cluster-config-map 3 31m
```
I think this is due to a race condition where the TM first terminates which
causes the JM to interpret the Job entering a failed state which causes it to
clean up the configmaps.
was:
I started up a Standalone session Flink cluster and ran one job on it. I
checked the configMaps and see the HA data.
```
kubectl get configmaps -n cc-flink-operator
NAME
DATA AGE
flink-config-sql-example-deployment-s3-testing
2 5m41s
flink-operator-config
3 42h
kube-root-ca.crt
1 42h
sql-example-deployment-s3-testing-000000003f57cd5f0000000000000002-config-map
0 11m
sql-example-deployment-s3-testing-cluster-config-map
5 12m
```
I then update the FlinkDep image field and the Flink cluster gets restarted.
The HA configmap for the job is now gone.
```
kubectl get configmaps -n cc-flink-operator
-- cc-k8s-auth-helper.sh is owned by @confluentinc/kpt
NAME DATA AGE
flink-config-sql-example-deployment-s3-testing 2 18m
flink-operator-config 3 43h
kube-root-ca.crt 1 43h
sql-example-deployment-s3-testing-cluster-config-map 3 31m
```
I think this is due to a race condition where the TM first terminates which
causes the JM to interpret the Job entering a failed state which causes it to
clean up the configmaps.
> Configmaps get cleaned up when upgrading standalone Flink cluster
> -----------------------------------------------------------------
>
> Key: FLINK-30287
> URL: https://issues.apache.org/jira/browse/FLINK-30287
> Project: Flink
> Issue Type: Bug
> Components: Kubernetes Operator
> Affects Versions: 1.2
> Reporter: Steven Zhang
> Priority: Major
>
> I started up a Standalone session Flink cluster and ran one job on it. I
> checked the configMaps and see the HA data.
> {panel}
> {panel}
> kubectl get configmaps -n cc-flink-operator
> NAME
> DATA AGE
> flink-config-sql-example-deployment-s3-testing
> 2 5m41s
> flink-operator-config
> 3 42h
> kube-root-ca.crt
> 1 42h
> sql-example-deployment-s3-testing-000000003f57cd5f0000000000000002-config-map
> 0 11m
> sql-example-deployment-s3-testing-cluster-config-map
> 5 12m
> ```
> I then update the FlinkDep image field and the Flink cluster gets restarted.
> The HA configmap for the job is now gone.
>
> ```
> kubectl get configmaps -n cc-flink-operator
> – cc-k8s-auth-helper.sh is owned by @confluentinc/kpt
> NAME DATA AGE
> flink-config-sql-example-deployment-s3-testing 2 18m
> flink-operator-config 3 43h
> kube-root-ca.crt 1 43h
> sql-example-deployment-s3-testing-cluster-config-map 3 31m
> ```
>
> I think this is due to a race condition where the TM first terminates which
> causes the JM to interpret the Job entering a failed state which causes it to
> clean up the configmaps.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)