[
https://issues.apache.org/jira/browse/FLINK-20219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238107#comment-17238107
]
Till Rohrmann commented on FLINK-20219:
---------------------------------------
Ok I see on Yarn this is not a problem because we will use a new
{{ApplicationId}} for each new cluster. However, the problem should then also
occur with the standalone mode, right? Hence, maybe it is good enough that the
client sends a {{shutDownCluster}} call to the cluster. What it does on the the
cluster is to suspend all running jobs. If there are no jobs left, then it will
clean up all the remaining zNodes/ConfigMaps, for example.
> Rethink the Kubernetes HA related ConfigMap clean up for session cluster
> ------------------------------------------------------------------------
>
> Key: FLINK-20219
> URL: https://issues.apache.org/jira/browse/FLINK-20219
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / Kubernetes, Runtime / Coordination
> Affects Versions: 1.12.0
> Reporter: Yang Wang
> Priority: Major
>
> When I am testing the Kubernetes HA service, I realize that ConfigMap clean
> up for session cluster(both standalone and native) are not very easy.
> * For the native K8s session, we suggest our users to stop it via {{echo
> 'stop' | ./bin/kubernetes-session.sh -Dkubernetes.cluster-id=<ClusterID>
> -Dexecution.attached=true}}. Currently, it has the same effect with {{kubectl
> delete deploy <ClusterID>}}. This will not clean up the leader
> ConfigMaps(e.g. ResourceManager, Dispatcher, RestServer, JobManager). Even
> though there is no running jobs before stop, we still get some retained
> ConfigMaps. So when and how to clean up the retained ConfigMaps? Should the
> user do it manually? Or we could provide some utilities in Flink client.
> * For the standalone session, I think it is reasonable for the users to do
> the HA ConfigMap clean up manually.
>
> We could use the following command to do the manually clean up.
> {{kubectl delete cm
> --selector='app=<ClusterID>,configmap-type=high-availability'}}
>
> Note: This is not a problem for Flink application cluster. Since we could do
> the clean up automatically when all the running jobs in the application
> reached terminal state(e.g. FAILED, CANCELED, FINISHED) and then destroy the
> Flink cluster.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)