[ https://issues.apache.org/jira/browse/FLINK-30287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steven Zhang updated FLINK-30287: --------------------------------- Description: I started up a Standalone session Flink cluster and ran one job on it. I checked the configMaps and see the HA data. {code:java} kubectl get configmaps -n cc-flink-operator NAME DATA AGE flink-config-sql-example-deployment-s3-testing 2 5m41s flink-operator-config 3 42h kube-root-ca.crt 1 42h sql-example-deployment-s3-testing-000000003f57cd5f0000000000000002-config-map 0 11m sql-example-deployment-s3-testing-cluster-config-map 5 12m {code} I then update the FlinkDep image field and the Flink cluster gets restarted. The HA configmap for the job is now gone. ``` kubectl get configmaps -n cc-flink-operator – cc-k8s-auth-helper.sh is owned by @confluentinc/kpt NAME DATA AGE flink-config-sql-example-deployment-s3-testing 2 18m flink-operator-config 3 43h kube-root-ca.crt 1 43h sql-example-deployment-s3-testing-cluster-config-map 3 31m ``` I think this is due to a race condition where the TM first terminates which causes the JM to interpret the Job entering a failed state which causes it to clean up the configmaps. was: I started up a Standalone session Flink cluster and ran one job on it. I checked the configMaps and see the HA data. {panel} {panel} kubectl get configmaps -n cc-flink-operator NAME DATA AGE flink-config-sql-example-deployment-s3-testing 2 5m41s flink-operator-config 3 42h kube-root-ca.crt 1 42h sql-example-deployment-s3-testing-000000003f57cd5f0000000000000002-config-map 0 11m sql-example-deployment-s3-testing-cluster-config-map 5 12m ``` I then update the FlinkDep image field and the Flink cluster gets restarted. The HA configmap for the job is now gone. ``` kubectl get configmaps -n cc-flink-operator – cc-k8s-auth-helper.sh is owned by @confluentinc/kpt NAME DATA AGE flink-config-sql-example-deployment-s3-testing 2 18m flink-operator-config 3 43h kube-root-ca.crt 1 43h sql-example-deployment-s3-testing-cluster-config-map 3 31m ``` I think this is due to a race condition where the TM first terminates which causes the JM to interpret the Job entering a failed state which causes it to clean up the configmaps. > Configmaps get cleaned up when upgrading standalone Flink cluster > ----------------------------------------------------------------- > > Key: FLINK-30287 > URL: https://issues.apache.org/jira/browse/FLINK-30287 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator > Affects Versions: 1.2 > Reporter: Steven Zhang > Priority: Major > > I started up a Standalone session Flink cluster and ran one job on it. I > checked the configMaps and see the HA data. > > {code:java} > kubectl get configmaps -n cc-flink-operator > NAME > DATA AGE > flink-config-sql-example-deployment-s3-testing > 2 5m41s > flink-operator-config > 3 42h > kube-root-ca.crt > 1 42h > sql-example-deployment-s3-testing-000000003f57cd5f0000000000000002-config-map > 0 11m > sql-example-deployment-s3-testing-cluster-config-map > 5 12m > {code} > > I then update the FlinkDep image field and the Flink cluster gets restarted. > The HA configmap for the job is now gone. > > ``` > kubectl get configmaps -n cc-flink-operator > – cc-k8s-auth-helper.sh is owned by @confluentinc/kpt > NAME DATA AGE > flink-config-sql-example-deployment-s3-testing 2 18m > flink-operator-config 3 43h > kube-root-ca.crt 1 43h > sql-example-deployment-s3-testing-cluster-config-map 3 31m > ``` > > I think this is due to a race condition where the TM first terminates which > causes the JM to interpret the Job entering a failed state which causes it to > clean up the configmaps. -- This message was sent by Atlassian Jira (v8.20.10#820010)