[
https://issues.apache.org/jira/browse/FLINK-24624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17433448#comment-17433448
]
Aitozi commented on FLINK-24624:
--------------------------------
After looking into the failure, It's caused by the lack of permission
{{2021-10-24 23:10:30,385 ERROR
org.apache.flink.kubernetes.cli.KubernetesSessionCli [] - Error while
running the Flink session.2021-10-24 23:10:30,385 ERROR
org.apache.flink.kubernetes.cli.KubernetesSessionCli [] - Error while
running the Flink
session.io.fabric8.kubernetes.client.KubernetesClientException: Failure
executing: GET at: [https://xxxx/api/v1/nodes]. Message: Forbidden! User xxx
doesn't have permission. nodes is forbidden: User "xxx" cannot list resource
"nodes" in API group "" at the cluster scope: noopinion by orca and marlin and
k8s rbac. at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673)
~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:610)
~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560)
~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521)
~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:504)
~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.listRequestHelper(BaseOperation.java:143)
~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:555)
~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:90)
~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at
org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.getLoadBalancerRestEndpoint(Fabric8FlinkKubeClient.java:463)
~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at
org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.getRestEndPointFromService(Fabric8FlinkKubeClient.java:438)
~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at
org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.getRestEndpoint(Fabric8FlinkKubeClient.java:191)
~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at
org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$1(KubernetesClusterDescriptor.java:98)
~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at
org.apache.flink.kubernetes.KubernetesClusterDescriptor.deploySessionCluster(KubernetesClusterDescriptor.java:164)
~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at
org.apache.flink.kubernetes.cli.KubernetesSessionCli.run(KubernetesSessionCli.java:114)
~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at
org.apache.flink.kubernetes.cli.KubernetesSessionCli.lambda$main$0(KubernetesSessionCli.java:198)
~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at
org.apache.flink.kubernetes.cli.KubernetesSessionCli.main(KubernetesSessionCli.java:198)
[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]}}
> Add clean up phase when kubernetes session start failed
> -------------------------------------------------------
>
> Key: FLINK-24624
> URL: https://issues.apache.org/jira/browse/FLINK-24624
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / Kubernetes
> Affects Versions: 1.14.0
> Reporter: Aitozi
> Priority: Major
>
> Serval k8s resources are created when deploy the kubernetes session. But the
> resource are left there when deploy failed. This will lead to the next
> failure or resource leak. So I think we should add the clean up phase when
> start failed
--
This message was sent by Atlassian Jira
(v8.3.4#803005)