[
https://issues.apache.org/jira/browse/FLINK-20219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237095#comment-17237095
]
Yang Wang commented on FLINK-20219:
-----------------------------------
When a Flink job reached the terminal state, its HA data will be cleaned up. So
we will not have residual ConfigMap keys or ZooKeeper nodes for them. For the
running job, I believe that it is reasonable to have the meta data in the
ConfigMap or ZooKeeper after stop.
The problem is that the empty ConfigMap(e.g. k8s-ha-app-1-dispatcher-leader,
k8s-ha-app-1-00000000000000000000000000000000-jobmanager-leader) or ZooKeeper
empty nodes(e.g. /flink/application_1602580065114_0696/jobgraphs,
/flink/application_1602580065114_0696/leader/resource_manager_lock,
/flink/application_1602580065114_0696/leader/e8d0f2a54d463db677180868b0e58aa0/job_manager_lock)
will be retained. It could happen when all the jobs are finished/canceled, and
then we stop the Flink session cluster.
> Rethink the Kubernetes HA related ConfigMap clean up for session cluster
> ------------------------------------------------------------------------
>
> Key: FLINK-20219
> URL: https://issues.apache.org/jira/browse/FLINK-20219
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / Kubernetes, Runtime / Coordination
> Affects Versions: 1.12.0
> Reporter: Yang Wang
> Priority: Major
>
> When I am testing the Kubernetes HA service, I realize that ConfigMap clean
> up for session cluster(both standalone and native) are not very easy.
> * For the native K8s session, we suggest our users to stop it via {{echo
> 'stop' | ./bin/kubernetes-session.sh -Dkubernetes.cluster-id=<ClusterID>
> -Dexecution.attached=true}}. Currently, it has the same effect with {{kubectl
> delete deploy <ClusterID>}}. This will not clean up the leader
> ConfigMaps(e.g. ResourceManager, Dispatcher, RestServer, JobManager). Even
> though there is no running jobs before stop, we still get some retained
> ConfigMaps. So when and how to clean up the retained ConfigMaps? Should the
> user do it manually? Or we could provide some utilities in Flink client.
> * For the standalone session, I think it is reasonable for the users to do
> the HA ConfigMap clean up manually.
>
> We could use the following command to do the manually clean up.
> {{kubectl delete cm
> --selector='app=<ClusterID>,configmap-type=high-availability'}}
>
> Note: This is not a problem for Flink application cluster. Since we could do
> the clean up automatically when all the running jobs in the application
> reached terminal state(e.g. FAILED, CANCELED, FINISHED) and then destroy the
> Flink cluster.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)