@Tamir: For 1: Do you get the same problems with 1.4.0 or is this a regression in 1.5.0? If you set the helm chart so the operator also watches its own namespaces like I mentioned in the jira do you still get the error?
For 2: This error happens when you delete the FlinkDeployment? Can you open a jira and share the logs? Operator/autoscaler configs are not sent to the Flink application so they won’t show up on the ui. This is intentional. Gyula On Sun, 14 May 2023 at 15:03, Tamir Sagi <tamir.s...@niceactimize.com> wrote: > Hey Guyla , dev-team > > I deployed rc-2 with helm on AWS EKS with HA enabled (3 pods). > > The operator watches 3 namespaces. > > I successfully deployed an application cluster(Flink 1.17) via pod > template. I encountered the following errors > > 1. > org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException","message":"Failure > executing: GET at: https://172.20.0.1/api/v1/nodes. Message: > Forbidden!Configured service account doesn't have access. Service account > may have been revoked. nodes is forbidden: User > "system:serviceaccount:dev-0-flink-clusters: > *dev-0-xsight-flink-operator-sa*" cannot list resource "nodes" in API > group "" at the cluster scope." > Seems like the role is correct. I comment in the following ticket: > https://issues.apache.org/jira/browse/FLINK-32041 > In addition, I noticed that kubernetes.rest-service.exposed.type was > on NodePort, once I changed it to ClusterIP the above error > disappeared. [1] > > Is there any chance it looks for kube.config file instead of reading > the service account? > > 2. When the cluster is deleted, the idle pods (not leaders) repeatedly > throw the following error : > [2023-05-14T12:00:50,388][Error] {} [i.f.k.c.i.i.c.SharedProcessor]: > apps/v1/namespaces/dev-0-flink-shadow-clusters/deployments failed invoking > InformerEventSource{resourceClass: Deployment} event handler: Cannot > receive event after a delete event received > java.lang.IllegalStateException: Cannot receive event after a delete > event received (enclosed stacktrace) > > In addition, I'm not sure whether it's an issue or not, but autoscaler > configurations (per cluster) are not shown neither in Flink web UI nor in > the response when calling /jobmanager/config. > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#accessing-flinks-web-ui > > Thanks, > Tamir > ------------------------------ > *From:* Jim Busche <jbus...@us.ibm.com> > *Sent:* Saturday, May 13, 2023 5:59 PM > *To:* dev@flink.apache.org <dev@flink.apache.org>; Hao t Chang < > htch...@us.ibm.com>; Anthony Garrard <garr...@uk.ibm.com> > *Subject:* Re: [VOTE] Apache Flink Kubernetes Operator Release 1.5.0, > release candidate #2 > > EXTERNAL EMAIL > > > > > Hi Guyla, > > I was able to deploy rc-2 with helm on a kind cluster and it was able to > deploy the sample. But I'm still struggling on OpenShift with rc-2. > There's some kind of RBAC permission issue that I haven't been able to > solve when it deploys the flinkdep or flinksessionjobs. > > > oc get flinkdep > > NAME JOB STATUS LIFECYCLE STATE > > basic-example UPGRADING > > basic-session-deployment-only-example UPGRADING > > > > oc get flinksessionjobs > > NAME JOB STATUS LIFECYCLE STATE > > basic-session-job-only-example > > > oc describe flinkdep basic-example > … > > Status: > > Cluster Info: > > Error: > {"type":"org.apache.flink.kubernetes.operator.exception.ReconciliationException","message":"org.apache.flink.client.deployment.ClusterDeploymentException: > Could not create Kubernetes cluster > \"basic-example\".","throwableList":[{"type":"org.apache.flink.client.deployment.ClusterDeploymentException","message":"Could > not create Kubernetes cluster > \"basic-example\"."},{"type":"org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException","message":"Failure > executing: POST at: > https://172.30.0.1/apis/apps/v1/namespaces/default/deployments. Message: > Forbidden!Configured service account doesn't have access. Service account > may have been revoked. deployments.apps \"basic-example\" is forbidden: > cannot set blockOwnerDeletion if an ownerReference refers to a resource you > can't set finalizers on: , <nil>."}]} > > Job Manager Deployment Status: MISSING > > I haven't been able to spot why/what's different between 1.5 and 1.4 > release (which still deploys fine.) > Hoping someone has an idea of what might be wrong. > > Thanks, Jim > > > Confidentiality: This communication and any attachments are intended for > the above-named persons only and may be confidential and/or legally > privileged. Any opinions expressed in this communication are not > necessarily those of NICE Actimize. If this communication has come to you > in error you must take no action based on it, nor must you copy or show it > to anyone; please delete/destroy and inform the sender by e-mail > immediately. > Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. > Viruses: Although we have taken steps toward ensuring that this e-mail and > attachments are free from any virus, we advise that in keeping with good > computing practice the recipient should ensure they are actually virus free. >