[
https://issues.apache.org/jira/browse/FLINK-24624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434081#comment-17434081
]
Aitozi commented on FLINK-24624:
--------------------------------
[~wangyang0918] Besides that, I create this issue also want to discuss that do
we have to guarantee the k8s resource is cleaned when we deploy a session or
application mode cluster failed.
As far as i know(I am doing some test to deploy kubernetes deploy), there is
residual k8s resources in some situations like:
1. deployClusterInternal success , but failed to getClusterClient from the
{{ClusterClientProvider}} which is shown in this issue.
2. deploySessionCluster success, but we have problem with deployment to spawn a
ready pod due to the resource or schedule problem or webhook intercept of
kubernetes
We can simply to try-catch the deploySessionCluster method block to solve the
case 1 which have been done in my PR.
But I still have some concern about the case2. I think there there should be a
deadline to spawn a cluster , the related resource should be destroy after
timeout.
> Add clean up phase when kubernetes session start failed
> -------------------------------------------------------
>
> Key: FLINK-24624
> URL: https://issues.apache.org/jira/browse/FLINK-24624
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / Kubernetes
> Affects Versions: 1.14.0
> Reporter: Aitozi
> Priority: Major
> Labels: pull-request-available
>
> Serval k8s resources are created when deploy the kubernetes session. But the
> resource are left there when deploy failed. This will lead to the next
> failure or resource leak. So I think we should add the clean up phase when
> start failed
--
This message was sent by Atlassian Jira
(v8.3.4#803005)