Hey Guyla, 1. The operator watches its own namespace as well. The error still happens(The only way to overcome this is to change kubernetes.rest-service.exposed.type to 'ClusterIP'). It seems connected to RestClient uses k8s client internally which needs NodeList permissions but instead of reading from Service account it looks for kube.config file. [1]
ClusterIP Service https://github.com/apache/flink/blob/release-1.17.0/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/kubeclient/services/ClusterIPService.java#L44-L53 NodePort Service https://github.com/apache/flink/blob/release-1.17.0/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/kubeclient/services/NodePortService.java#L62 2. Yes, it occurred upon deletion. The leader continued normally where the idle pods constantly printed those errors. (did not crash though). I created a bug: https://issues.apache.org/jira/browse/FLINK-32093 3. I believe it is crucial to have all cluster configurations in cluster's dashboard, particularly in production. If the operator had a UI, it could have filled that void 🙂 [1] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#kubernetes-config-file ________________________________ From: Gyula Fóra <gyula.f...@gmail.com> Sent: Sunday, May 14, 2023 4:25 PM To: dev@flink.apache.org <dev@flink.apache.org> Cc: Anthony Garrard <garr...@uk.ibm.com>; Hao t Chang <htch...@us.ibm.com> Subject: Re: [VOTE] Apache Flink Kubernetes Operator Release 1.5.0, release candidate #2 EXTERNAL EMAIL @Tamir: For 1: Do you get the same problems with 1.4.0 or is this a regression in 1.5.0? If you set the helm chart so the operator also watches its own namespaces like I mentioned in the jira do you still get the error? For 2: This error happens when you delete the FlinkDeployment? Can you open a jira and share the logs? Operator/autoscaler configs are not sent to the Flink application so they won’t show up on the ui. This is intentional. Gyula On Sun, 14 May 2023 at 15:03, Tamir Sagi <tamir.s...@niceactimize.com> wrote: > Hey Guyla , dev-team > > I deployed rc-2 with helm on AWS EKS with HA enabled (3 pods). > > The operator watches 3 namespaces. > > I successfully deployed an application cluster(Flink 1.17) via pod > template. I encountered the following errors > > 1. > org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException","message":"Failure > executing: GET at: https://172.20.0.1/api/v1/nodes. Message: > Forbidden!Configured service account doesn't have access. Service account > may have been revoked. nodes is forbidden: User > "system:serviceaccount:dev-0-flink-clusters: > *dev-0-xsight-flink-operator-sa*" cannot list resource "nodes" in API > group "" at the cluster scope." > Seems like the role is correct. I comment in the following ticket: > https://issues.apache.org/jira/browse/FLINK-32041 > In addition, I noticed that kubernetes.rest-service.exposed.type was > on NodePort​, once I changed it to ClusterIP​ the above error > disappeared. [1] > > Is there any chance it looks for kube.config file instead of reading > the service account? > > 2. When the cluster is deleted, the idle pods (not leaders) repeatedly > throw the following error : > [2023-05-14T12:00:50,388][Error] {} [i.f.k.c.i.i.c.SharedProcessor]: > apps/v1/namespaces/dev-0-flink-shadow-clusters/deployments failed invoking > InformerEventSource{resourceClass: Deployment} event handler: Cannot > receive event after a delete event received > java.lang.IllegalStateException: Cannot receive event after a delete > event received (enclosed stacktrace) > > In addition, I'm not sure whether it's an issue or not, but autoscaler > configurations (per cluster) are not shown neither in Flink web UI nor in > the response when calling /jobmanager/config. > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#accessing-flinks-web-ui > > Thanks, > Tamir > ------------------------------ > *From:* Jim Busche <jbus...@us.ibm.com> > *Sent:* Saturday, May 13, 2023 5:59 PM > *To:* dev@flink.apache.org <dev@flink.apache.org>; Hao t Chang < > htch...@us.ibm.com>; Anthony Garrard <garr...@uk.ibm.com> > *Subject:* Re: [VOTE] Apache Flink Kubernetes Operator Release 1.5.0, > release candidate #2 > > EXTERNAL EMAIL > > > > > Hi Guyla, > > I was able to deploy rc-2 with helm on a kind cluster and it was able to > deploy the sample. But I'm still struggling on OpenShift with rc-2. > There's some kind of RBAC permission issue that I haven't been able to > solve when it deploys the flinkdep or flinksessionjobs. > > > oc get flinkdep > > NAME JOB STATUS LIFECYCLE STATE > > basic-example UPGRADING > > basic-session-deployment-only-example UPGRADING > > > > oc get flinksessionjobs > > NAME JOB STATUS LIFECYCLE STATE > > basic-session-job-only-example > > > oc describe flinkdep basic-example > … > > Status: > > Cluster Info: > > Error: > {"type":"org.apache.flink.kubernetes.operator.exception.ReconciliationException","message":"org.apache.flink.client.deployment.ClusterDeploymentException: > Could not create Kubernetes cluster > \"basic-example\".","throwableList":[{"type":"org.apache.flink.client.deployment.ClusterDeploymentException","message":"Could > not create Kubernetes cluster > \"basic-example\"."},{"type":"org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException","message":"Failure > executing: POST at: > https://172.30.0.1/apis/apps/v1/namespaces/default/deployments. Message: > Forbidden!Configured service account doesn't have access. Service account > may have been revoked. deployments.apps \"basic-example\" is forbidden: > cannot set blockOwnerDeletion if an ownerReference refers to a resource you > can't set finalizers on: , <nil>."}]} > > Job Manager Deployment Status: MISSING > > I haven't been able to spot why/what's different between 1.5 and 1.4 > release (which still deploys fine.) > Hoping someone has an idea of what might be wrong. > > Thanks, Jim > > > Confidentiality: This communication and any attachments are intended for > the above-named persons only and may be confidential and/or legally > privileged. Any opinions expressed in this communication are not > necessarily those of NICE Actimize. If this communication has come to you > in error you must take no action based on it, nor must you copy or show it > to anyone; please delete/destroy and inform the sender by e-mail > immediately. > Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. > Viruses: Although we have taken steps toward ensuring that this e-mail and > attachments are free from any virus, we advise that in keeping with good > computing practice the recipient should ensure they are actually virus free. > Confidentiality: This communication and any attachments are intended for the above-named persons only and may be confidential and/or legally privileged. Any opinions expressed in this communication are not necessarily those of NICE Actimize. If this communication has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender by e-mail immediately. Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. Viruses: Although we have taken steps toward ensuring that this e-mail and attachments are free from any virus, we advise that in keeping with good computing practice the recipient should ensure they are actually virus free.