[ https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628320#comment-17628320 ]
Yun Tang commented on FLINK-22262: ---------------------------------- [~wangyang0918] I come across a rare case: When the jobmanager pod deletes the deployment on job cancelation, it suddenly restarts due to some reason. Thus the job would submit again from previous savepoint and create new HA related configmaps with the restoring savepoint just as the job started again. After a while, since the deployment has been deleted, the job manager would finally be deleted and no taskmanagers could be created. However, those HA related configmaps left behind due to not having OwnerReference. Then user submit the job again, however, since the left HA related configmaps, the job would resume from previous savepoints, which leads to incorrect job state. I think offering options to let HA related configmaps have OwnerReference with deployment is reasonable in some cases. Or do you have some suggestions to walk around this problem? > Flink on Kubernetes ConfigMaps are created without OwnerReference > ----------------------------------------------------------------- > > Key: FLINK-22262 > URL: https://issues.apache.org/jira/browse/FLINK-22262 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes > Affects Versions: 1.13.0 > Reporter: Andrea Peruffo > Priority: Not a Priority > Labels: auto-deprioritized-major, auto-deprioritized-minor > Attachments: jm.log > > > According to the documentation: > [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup] > The ConfigMaps created along with the Flink deployment is supposed to have an > OwnerReference pointing to the Deployment itself, unfortunately, this doesn't > happen and causes all sorts of issues when the classpath and the jars of the > job are updated. > i.e.: > Without manually removing the ConfigMap of the Job I cannot update the Jars > of the Job. > Can you please give guidance if there are additional caveats on manually > removing the ConfigMap? Any other workaround that can be used? > Thanks in advance. > Example ConfigMap: > {{apiVersion: v1}} > {{data:}} > {{ address: akka.tcp://flink@10.0.2.13:6123/user/rpc/jobmanager_2}} > {{ checkpointID-0000000000000000049: > rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAAAAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAAAAAAAAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}} > {{ counter: "50"}} > {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}} > {{kind: ConfigMap}} > {{metadata:}} > {{ annotations:}} > {{ control-plane.alpha.kubernetes.io/leader: > '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.000000000,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}} > {{ creationTimestamp: "2021-04-13T14:30:51Z"}} > {{ labels:}} > {{ app: taxi-ride-fare-processor}} > {{ configmap-type: high-availability}} > {{ type: flink-native-kubernetes}} > {{ name: > taxi-ride-fare-processor-00000000000000000000000000000000-jobmanager-leader}} > {{ namespace: taxi-ride-fare}} > {{ resourceVersion: "64100"}} > {{ selfLink: > /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor-00000000000000000000000000000000-jobmanager-leader}} > {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}} -- This message was sent by Atlassian Jira (v8.20.10#820010)