[jira] [Commented] (FLINK-20219) Rethink the HA related ZNodes/ConfigMap clean up for session cluster

Yang Wang (Jira) Wed, 17 Mar 2021 01:55:04 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-20219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17303198#comment-17303198
 ]


Yang Wang commented on FLINK-20219:
-----------------------------------

??We do not have this problem for Yarn deployment. Because we do not support 
recovery across different applications. When a Yarn session cluster is stopped, 
even though there are still some running Flink jobs, we will always clean up 
the all the HA data.??

 

I have to revoke my description about the Yarn session cluster. Even though 
Flink session cluster will have a new applicationId for each new cluster, we 
could still use a fixed high-availability cluster-id via the following command.
{code:java}
./bin/yarn-session.sh -d -Dhigh-availability.cluster-id=yarn-session-1
{code}
It means that we should not always clean up the HA related data when stopping 
the session cluster. The current behavior is not correct.

After then the Yarn session cluster is just in the same situation with 
Kubernetes and standalone session. We could unify them via sending 
{{shutDownCluster}} call in the Flink client and letting the cluster to 
determine whether to clean up the remaining HA data. Only when no jobs running, 
we need to clean up the HA data.

> Rethink the HA related ZNodes/ConfigMap clean up for session cluster
> --------------------------------------------------------------------
>
>                 Key: FLINK-20219
>                 URL: https://issues.apache.org/jira/browse/FLINK-20219
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / Kubernetes, Deployment / Scripts, Runtime / 
> Coordination
>    Affects Versions: 1.12.0
>            Reporter: Yang Wang
>            Priority: Major
>
> When I am testing the Kubernetes HA service, I realize that ConfigMap clean 
> up for session cluster(both standalone and native) are not very easy.
>  * For the native K8s session, we suggest our users to stop it via {{echo 
> 'stop' | ./bin/kubernetes-session.sh -Dkubernetes.cluster-id=<ClusterID> 
> -Dexecution.attached=true}}. Currently, it has the same effect with {{kubectl 
> delete deploy <ClusterID>}}. This will not clean up the leader 
> ConfigMaps(e.g. ResourceManager, Dispatcher, RestServer, JobManager). Even 
> though there is no running jobs before stop, we still get some retained 
> ConfigMaps. So when and how to clean up the retained ConfigMaps? Should the 
> user do it manually? Or we could provide some utilities in Flink client.
>  * For the standalone session, I think it is reasonable for the users to do 
> the HA ConfigMap clean up manually.
>  
> We could use the following command to do the manually clean up.
> {{kubectl delete cm 
> --selector='app=<ClusterID>,configmap-type=high-availability'}}
>  
> Note: This is not a problem for Flink application cluster. Since we could do 
> the clean up automatically when all the running jobs in the application 
> reached terminal state(e.g. FAILED, CANCELED, FINISHED) and then destroy the 
> Flink cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-20219) Rethink the HA related ZNodes/ConfigMap clean up for session cluster

Reply via email to