[
https://issues.apache.org/jira/browse/FLINK-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628320#comment-17628320
]
Yun Tang commented on FLINK-22262:
----------------------------------
[~wangyang0918] I come across a rare case:
When the jobmanager pod deletes the deployment on job cancelation, it suddenly
restarts due to some reason. Thus the job would submit again from previous
savepoint and create new HA related configmaps with the restoring savepoint
just as the job started again. After a while, since the deployment has been
deleted, the job manager would finally be deleted and no taskmanagers could be
created. However, those HA related configmaps left behind due to not having
OwnerReference.
Then user submit the job again, however, since the left HA related configmaps,
the job would resume from previous savepoints, which leads to incorrect job
state.
I think offering options to let HA related configmaps have OwnerReference with
deployment is reasonable in some cases.
Or do you have some suggestions to walk around this problem?
> Flink on Kubernetes ConfigMaps are created without OwnerReference
> -----------------------------------------------------------------
>
> Key: FLINK-22262
> URL: https://issues.apache.org/jira/browse/FLINK-22262
> Project: Flink
> Issue Type: Bug
> Components: Deployment / Kubernetes
> Affects Versions: 1.13.0
> Reporter: Andrea Peruffo
> Priority: Not a Priority
> Labels: auto-deprioritized-major, auto-deprioritized-minor
> Attachments: jm.log
>
>
> According to the documentation:
> [https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#manual-resource-cleanup]
> The ConfigMaps created along with the Flink deployment is supposed to have an
> OwnerReference pointing to the Deployment itself, unfortunately, this doesn't
> happen and causes all sorts of issues when the classpath and the jars of the
> job are updated.
> i.e.:
> Without manually removing the ConfigMap of the Job I cannot update the Jars
> of the Job.
> Can you please give guidance if there are additional caveats on manually
> removing the ConfigMap? Any other workaround that can be used?
> Thanks in advance.
> Example ConfigMap:
> {{apiVersion: v1}}
> {{data:}}
> {{ address: akka.tcp://[email protected]:6123/user/rpc/jobmanager_2}}
> {{ checkpointID-0000000000000000049:
> rO0ABXNyADtvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuUmV0cmlldmFibGVTdHJlYW1TdGF0ZUhhbmRsZQABHhjxVZcrAgABTAAYd3JhcHBlZFN0cmVhbVN0YXRlSGFuZGxldAAyTG9yZy9hcGFjaGUvZmxpbmsvcnVudGltZS9zdGF0ZS9TdHJlYW1TdGF0ZUhhbmRsZTt4cHNyADlvcmcuYXBhY2hlLmZsaW5rLnJ1bnRpbWUuc3RhdGUuZmlsZXN5c3RlbS5GaWxlU3RhdGVIYW5kbGUE3HXYYr0bswIAAkoACXN0YXRlU2l6ZUwACGZpbGVQYXRodAAfTG9yZy9hcGFjaGUvZmxpbmsvY29yZS9mcy9QYXRoO3hwAAAAAAABOEtzcgAdb3JnLmFwYWNoZS5mbGluay5jb3JlLmZzLlBhdGgAAAAAAAAAAQIAAUwAA3VyaXQADkxqYXZhL25ldC9VUkk7eHBzcgAMamF2YS5uZXQuVVJJrAF4LkOeSasDAAFMAAZzdHJpbmd0ABJMamF2YS9sYW5nL1N0cmluZzt4cHQAUC9tbnQvZmxpbmsvc3RvcmFnZS9rc2hhL3RheGktcmlkZS1mYXJlLXByb2Nlc3Nvci9jb21wbGV0ZWRDaGVja3BvaW50MDQ0YTc2OWRkNDgxeA==}}
> {{ counter: "50"}}
> {{ sessionId: 0c2b69ee-6b41-48d3-b7fd-1bf2eda94f0f}}
> {{kind: ConfigMap}}
> {{metadata:}}
> {{ annotations:}}
> {{ control-plane.alpha.kubernetes.io/leader:
> '\{"holderIdentity":"0f25a2cc-e212-46b0-8ba9-faac0732a316","leaseDuration":15.000000000,"acquireTime":"2021-04-13T14:30:51.439000Z","renewTime":"2021-04-13T14:39:32.011000Z","leaderTransitions":105}'}}
> {{ creationTimestamp: "2021-04-13T14:30:51Z"}}
> {{ labels:}}
> {{ app: taxi-ride-fare-processor}}
> {{ configmap-type: high-availability}}
> {{ type: flink-native-kubernetes}}
> {{ name:
> taxi-ride-fare-processor-00000000000000000000000000000000-jobmanager-leader}}
> {{ namespace: taxi-ride-fare}}
> {{ resourceVersion: "64100"}}
> {{ selfLink:
> /api/v1/namespaces/taxi-ride-fare/configmaps/taxi-ride-fare-processor-00000000000000000000000000000000-jobmanager-leader}}
> {{ uid: 9f912495-382a-45de-a789-fd5ad2a2459d}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)