[ 
https://issues.apache.org/jira/browse/FLINK-24624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17433448#comment-17433448
 ] 

Aitozi commented on FLINK-24624:
--------------------------------

After looking into the failure, It's caused by the lack of permission

{{2021-10-24 23:10:30,385 ERROR 
org.apache.flink.kubernetes.cli.KubernetesSessionCli         [] - Error while 
running the Flink session.2021-10-24 23:10:30,385 ERROR 
org.apache.flink.kubernetes.cli.KubernetesSessionCli         [] - Error while 
running the Flink 
session.io.fabric8.kubernetes.client.KubernetesClientException: Failure 
executing: GET at: [https://xxxx/api/v1/nodes]. Message: Forbidden! User xxx 
doesn't have permission. nodes is forbidden: User "xxx" cannot list resource 
"nodes" in API group "" at the cluster scope: noopinion by orca and marlin and 
k8s rbac. at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673)
 ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:610)
 ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560)
 ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521)
 ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:504)
 ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.listRequestHelper(BaseOperation.java:143)
 ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:555)
 ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:90) 
~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at 
org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.getLoadBalancerRestEndpoint(Fabric8FlinkKubeClient.java:463)
 ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at 
org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.getRestEndPointFromService(Fabric8FlinkKubeClient.java:438)
 ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at 
org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.getRestEndpoint(Fabric8FlinkKubeClient.java:191)
 ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at 
org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$1(KubernetesClusterDescriptor.java:98)
 ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at 
org.apache.flink.kubernetes.KubernetesClusterDescriptor.deploySessionCluster(KubernetesClusterDescriptor.java:164)
 ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at 
org.apache.flink.kubernetes.cli.KubernetesSessionCli.run(KubernetesSessionCli.java:114)
 ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at 
org.apache.flink.kubernetes.cli.KubernetesSessionCli.lambda$main$0(KubernetesSessionCli.java:198)
 ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at 
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
 ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at 
org.apache.flink.kubernetes.cli.KubernetesSessionCli.main(KubernetesSessionCli.java:198)
 [flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]}}

> Add clean up phase when kubernetes session start failed
> -------------------------------------------------------
>
>                 Key: FLINK-24624
>                 URL: https://issues.apache.org/jira/browse/FLINK-24624
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.14.0
>            Reporter: Aitozi
>            Priority: Major
>
> Serval k8s resources are created when deploy the kubernetes session. But the 
> resource are left there when deploy failed. This will lead to the next 
> failure or resource leak. So I think we should add the clean up phase when 
> start failed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to