@Tamir:

For 1:
Do you get the same problems with 1.4.0 or is this a regression in 1.5.0?
If you set the helm chart so the operator also watches its own namespaces
like I mentioned in the jira do you still get the error?

For 2:
This error happens when you delete the FlinkDeployment? Can you open a jira
and share the logs?

Operator/autoscaler configs are not sent to the Flink application so they
won’t show up on the ui. This is intentional.

Gyula

On Sun, 14 May 2023 at 15:03, Tamir Sagi <tamir.s...@niceactimize.com>
wrote:

> Hey Guyla , dev-team
>
> I deployed rc-2 with helm on AWS EKS with HA enabled (3 pods).
>
> The operator watches 3 namespaces.
>
> I successfully deployed an application cluster(Flink 1.17) via pod
> template. I encountered the following errors
>
>    1. 
> org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException","message":"Failure
>    executing: GET at: https://172.20.0.1/api/v1/nodes. Message:
>    Forbidden!Configured service account doesn't have access. Service account
>    may have been revoked. nodes is forbidden: User
>    "system:serviceaccount:dev-0-flink-clusters:
>    *dev-0-xsight-flink-operator-sa*" cannot list resource "nodes" in API
>    group "" at the cluster scope."
>    Seems like the role is correct. I comment in the following ticket:
>    https://issues.apache.org/jira/browse/FLINK-32041
>    In addition, I noticed that kubernetes.rest-service.exposed.type was
>    on NodePort​, once I changed it to ClusterIP​ the above error
>    disappeared. [1]
>
>    Is there any chance it looks for kube.config file instead of reading
>    the service account?
>
>    2. When the cluster is deleted, the idle pods (not leaders) repeatedly
>    throw the following error :
>    [2023-05-14T12:00:50,388][Error] {} [i.f.k.c.i.i.c.SharedProcessor]:
>    apps/v1/namespaces/dev-0-flink-shadow-clusters/deployments failed invoking
>    InformerEventSource{resourceClass: Deployment} event handler: Cannot
>    receive event after a delete event received
>    java.lang.IllegalStateException: Cannot receive event after a delete
>    event received (enclosed stacktrace)
>
> In addition, I'm not sure whether it's an issue or not, but autoscaler
> configurations (per cluster) are not shown neither in Flink web UI nor in
> the response when calling /jobmanager/config.
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#accessing-flinks-web-ui
>
> Thanks,
> Tamir
> ------------------------------
> *From:* Jim Busche <jbus...@us.ibm.com>
> *Sent:* Saturday, May 13, 2023 5:59 PM
> *To:* dev@flink.apache.org <dev@flink.apache.org>; Hao t Chang <
> htch...@us.ibm.com>; Anthony Garrard <garr...@uk.ibm.com>
> *Subject:* Re: [VOTE] Apache Flink Kubernetes Operator Release 1.5.0,
> release candidate #2
>
> EXTERNAL EMAIL
>
>
>
>
> Hi Guyla,
>
> I was able to deploy rc-2 with helm on a kind cluster and it was able to
> deploy the sample.  But I'm still struggling on OpenShift with rc-2.
> There's some kind of RBAC permission issue that I haven't been able to
> solve when it deploys the flinkdep or flinksessionjobs.
>
>
> oc get flinkdep
>
> NAME                                    JOB STATUS   LIFECYCLE STATE
>
> basic-example                                        UPGRADING
>
> basic-session-deployment-only-example                UPGRADING
>
>
>
> oc get flinksessionjobs
>
> NAME                             JOB STATUS   LIFECYCLE STATE
>
> basic-session-job-only-example
>
>
> oc describe flinkdep basic-example
> …
>
> Status:
>
>   Cluster Info:
>
>   Error:
> {"type":"org.apache.flink.kubernetes.operator.exception.ReconciliationException","message":"org.apache.flink.client.deployment.ClusterDeploymentException:
> Could not create Kubernetes cluster
> \"basic-example\".","throwableList":[{"type":"org.apache.flink.client.deployment.ClusterDeploymentException","message":"Could
> not create Kubernetes cluster
> \"basic-example\"."},{"type":"org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException","message":"Failure
> executing: POST at:
> https://172.30.0.1/apis/apps/v1/namespaces/default/deployments. Message:
> Forbidden!Configured service account doesn't have access. Service account
> may have been revoked. deployments.apps \"basic-example\" is forbidden:
> cannot set blockOwnerDeletion if an ownerReference refers to a resource you
> can't set finalizers on: , <nil>."}]}
>
>   Job Manager Deployment Status:  MISSING
>
> I haven't been able to spot why/what's different between 1.5 and 1.4
> release (which still deploys fine.)
> Hoping someone has an idea of what might be wrong.
>
> Thanks, Jim
>
>
> Confidentiality: This communication and any attachments are intended for
> the above-named persons only and may be confidential and/or legally
> privileged. Any opinions expressed in this communication are not
> necessarily those of NICE Actimize. If this communication has come to you
> in error you must take no action based on it, nor must you copy or show it
> to anyone; please delete/destroy and inform the sender by e-mail
> immediately.
> Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
> Viruses: Although we have taken steps toward ensuring that this e-mail and
> attachments are free from any virus, we advise that in keeping with good
> computing practice the recipient should ensure they are actually virus free.
>

Reply via email to