Hey Guyla,

1. The operator watches its own namespace as well. The error still happens(The 
only way to overcome this is to change kubernetes.rest-service.exposed.type to 
'ClusterIP'). It seems connected to RestClient uses k8s client internally which 
needs NodeList permissions but instead of reading from Service account it looks 
for kube.config file. [1]

ClusterIP Service
https://github.com/apache/flink/blob/release-1.17.0/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/kubeclient/services/ClusterIPService.java#L44-L53

NodePort Service
https://github.com/apache/flink/blob/release-1.17.0/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/kubeclient/services/NodePortService.java#L62

2. Yes, it occurred upon deletion. The leader continued normally where the idle 
pods constantly printed those errors. (did not crash though).
I created a bug: https://issues.apache.org/jira/browse/FLINK-32093

3. I believe it is crucial to have all cluster configurations in cluster's 
dashboard, particularly in production.  If the operator had a UI, it could have 
filled that void 🙂

[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#kubernetes-config-file
________________________________
From: Gyula FĂłra <gyula.f...@gmail.com>
Sent: Sunday, May 14, 2023 4:25 PM
To: dev@flink.apache.org <dev@flink.apache.org>
Cc: Anthony Garrard <garr...@uk.ibm.com>; Hao t Chang <htch...@us.ibm.com>
Subject: Re: [VOTE] Apache Flink Kubernetes Operator Release 1.5.0, release 
candidate #2

EXTERNAL EMAIL



@Tamir:

For 1:
Do you get the same problems with 1.4.0 or is this a regression in 1.5.0?
If you set the helm chart so the operator also watches its own namespaces
like I mentioned in the jira do you still get the error?

For 2:
This error happens when you delete the FlinkDeployment? Can you open a jira
and share the logs?

Operator/autoscaler configs are not sent to the Flink application so they
won’t show up on the ui. This is intentional.

Gyula

On Sun, 14 May 2023 at 15:03, Tamir Sagi <tamir.s...@niceactimize.com>
wrote:

> Hey Guyla , dev-team
>
> I deployed rc-2 with helm on AWS EKS with HA enabled (3 pods).
>
> The operator watches 3 namespaces.
>
> I successfully deployed an application cluster(Flink 1.17) via pod
> template. I encountered the following errors
>
>    1. 
> org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException","message":"Failure
>    executing: GET at: https://172.20.0.1/api/v1/nodes. Message:
>    Forbidden!Configured service account doesn't have access. Service account
>    may have been revoked. nodes is forbidden: User
>    "system:serviceaccount:dev-0-flink-clusters:
>    *dev-0-xsight-flink-operator-sa*" cannot list resource "nodes" in API
>    group "" at the cluster scope."
>    Seems like the role is correct. I comment in the following ticket:
>    https://issues.apache.org/jira/browse/FLINK-32041
>    In addition, I noticed that kubernetes.rest-service.exposed.type was
>    on NodePort​, once I changed it to ClusterIP​ the above error
>    disappeared. [1]
>
>    Is there any chance it looks for kube.config file instead of reading
>    the service account?
>
>    2. When the cluster is deleted, the idle pods (not leaders) repeatedly
>    throw the following error :
>    [2023-05-14T12:00:50,388][Error] {} [i.f.k.c.i.i.c.SharedProcessor]:
>    apps/v1/namespaces/dev-0-flink-shadow-clusters/deployments failed invoking
>    InformerEventSource{resourceClass: Deployment} event handler: Cannot
>    receive event after a delete event received
>    java.lang.IllegalStateException: Cannot receive event after a delete
>    event received (enclosed stacktrace)
>
> In addition, I'm not sure whether it's an issue or not, but autoscaler
> configurations (per cluster) are not shown neither in Flink web UI nor in
> the response when calling /jobmanager/config.
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#accessing-flinks-web-ui
>
> Thanks,
> Tamir
> ------------------------------
> *From:* Jim Busche <jbus...@us.ibm.com>
> *Sent:* Saturday, May 13, 2023 5:59 PM
> *To:* dev@flink.apache.org <dev@flink.apache.org>; Hao t Chang <
> htch...@us.ibm.com>; Anthony Garrard <garr...@uk.ibm.com>
> *Subject:* Re: [VOTE] Apache Flink Kubernetes Operator Release 1.5.0,
> release candidate #2
>
> EXTERNAL EMAIL
>
>
>
>
> Hi Guyla,
>
> I was able to deploy rc-2 with helm on a kind cluster and it was able to
> deploy the sample.  But I'm still struggling on OpenShift with rc-2.
> There's some kind of RBAC permission issue that I haven't been able to
> solve when it deploys the flinkdep or flinksessionjobs.
>
>
> oc get flinkdep
>
> NAME                                    JOB STATUS   LIFECYCLE STATE
>
> basic-example                                        UPGRADING
>
> basic-session-deployment-only-example                UPGRADING
>
>
>
> oc get flinksessionjobs
>
> NAME                             JOB STATUS   LIFECYCLE STATE
>
> basic-session-job-only-example
>
>
> oc describe flinkdep basic-example
> …
>
> Status:
>
>   Cluster Info:
>
>   Error:
> {"type":"org.apache.flink.kubernetes.operator.exception.ReconciliationException","message":"org.apache.flink.client.deployment.ClusterDeploymentException:
> Could not create Kubernetes cluster
> \"basic-example\".","throwableList":[{"type":"org.apache.flink.client.deployment.ClusterDeploymentException","message":"Could
> not create Kubernetes cluster
> \"basic-example\"."},{"type":"org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException","message":"Failure
> executing: POST at:
> https://172.30.0.1/apis/apps/v1/namespaces/default/deployments. Message:
> Forbidden!Configured service account doesn't have access. Service account
> may have been revoked. deployments.apps \"basic-example\" is forbidden:
> cannot set blockOwnerDeletion if an ownerReference refers to a resource you
> can't set finalizers on: , <nil>."}]}
>
>   Job Manager Deployment Status:  MISSING
>
> I haven't been able to spot why/what's different between 1.5 and 1.4
> release (which still deploys fine.)
> Hoping someone has an idea of what might be wrong.
>
> Thanks, Jim
>
>
> Confidentiality: This communication and any attachments are intended for
> the above-named persons only and may be confidential and/or legally
> privileged. Any opinions expressed in this communication are not
> necessarily those of NICE Actimize. If this communication has come to you
> in error you must take no action based on it, nor must you copy or show it
> to anyone; please delete/destroy and inform the sender by e-mail
> immediately.
> Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
> Viruses: Although we have taken steps toward ensuring that this e-mail and
> attachments are free from any virus, we advise that in keeping with good
> computing practice the recipient should ensure they are actually virus free.
>

Confidentiality: This communication and any attachments are intended for the 
above-named persons only and may be confidential and/or legally privileged. Any 
opinions expressed in this communication are not necessarily those of NICE 
Actimize. If this communication has come to you in error you must take no 
action based on it, nor must you copy or show it to anyone; please 
delete/destroy and inform the sender by e-mail immediately.
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and 
attachments are free from any virus, we advise that in keeping with good 
computing practice the recipient should ensure they are actually virus free.

Reply via email to