Kumar Mallikarjuna created FLINK-38047:
------------------------------------------
Summary: Bump cert-manager in the Kubernetes Operator
Key: FLINK-38047
URL: https://issues.apache.org/jira/browse/FLINK-38047
Project: Flink
Issue Type: Technical Debt
Components: Kubernetes Operator
Reporter: Kumar Mallikarjuna
Flink Kubernetes Operator currently use cert-manager:{_}v1.8.2{_} in the
[CI|https://github.com/apache/flink-kubernetes-operator/blob/main/e2e-tests/cert-manager.yaml]
and recommends the same in
[docs|https://github.com/apache/flink-kubernetes-operator/blob/8812c78cd6a2c0ad1b672ca08a8b880bd890ae8b/docs/content/docs/try-flink-kubernetes-operator/quick-start.md?plain=1#L69-L72].
The latest stable release _v1.18.2_ is ten minor versions ahead. We should
bump the recommendations and tests to the latest release.
Validation for _cert-manager:v1.18.2_ with
{_}flink-kubernetes-operator:v1.12.0{_}:
1. Start a kind cluster
{code:java}
➜ flink-kubernetes-operator git:(main) ✗ kind create cluster
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.32.2) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:kubectl cluster-info --context kind-kindHave
a nice day! 👋
{code}
2. Install cert-manager v1.18.2
{code:java}
➜ flink-kubernetes-operator git:(main) ✗ kubectl create -f
https://github.com/cert-manager/cert-manager/releases/download/v1.18.2/cert-manager.yaml
namespace/cert-manager created
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io
created
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io
created
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io
created
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io
created
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io
created
serviceaccount/cert-manager-cainjector created
serviceaccount/cert-manager created
serviceaccount/cert-manager-webhook created
clusterrole.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers
created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificates
created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim
created
clusterrole.rbac.authorization.k8s.io/cert-manager-cluster-view created
clusterrole.rbac.authorization.k8s.io/cert-manager-view created
clusterrole.rbac.authorization.k8s.io/cert-manager-edit created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io
created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests
created
clusterrole.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews
created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-issuers
created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers
created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificates
created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-orders
created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-challenges
created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim
created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io
created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests
created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews
created
role.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
role.rbac.authorization.k8s.io/cert-manager:leaderelection created
role.rbac.authorization.k8s.io/cert-manager-tokenrequest created
role.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
rolebinding.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection
created
rolebinding.rbac.authorization.k8s.io/cert-manager:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager-cert-manager-tokenrequest
created
rolebinding.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving
created
service/cert-manager-cainjector created
service/cert-manager created
service/cert-manager-webhook created
deployment.apps/cert-manager-cainjector created
deployment.apps/cert-manager created
deployment.apps/cert-manager-webhook created
mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook
created
validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook
created
{code}
3. Wait for cert-manager to be ready
{code:java}
➜ flink-kubernetes-operator git:(main) ✗ k -n cert-manager get po
NAME READY STATUS RESTARTS AGE
cert-manager-69f748766f-28s8d 1/1 Running 0 44s
cert-manager-cainjector-7cf6557c49-gdfd7 1/1 Running 0 44s
cert-manager-webhook-58f4cff74d-kz4pc 1/1 Running 0 44s
{code}
4. Install flink-kubernetes-operator
{code:java}
➜ flink-kubernetes-operator git:(main) ✗ helm install
flink-kubernetes-operator flink-operator-repo/flink-kubernetes-operator
W0704 14:33:26.593488 51760 warnings.go:70] spec.privateKey.rotationPolicy:
In cert-manager >= v1.18.0, the default value changed from `Never` to `Always`.
NAME: flink-kubernetes-operator
LAST DEPLOYED: Fri Jul 4 14:33:25 2025
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None{code}
*Note:* The warning about `spec.privateKey.rotationPolicy` is expected and can
be ignored since it does not affect the functionality of the operator/webhook.
5. Verify the operator/webhook are running
{code:java}
➜ flink-kubernetes-operator git:(main) ✗ k get po
NAME READY STATUS RESTARTS AGE
flink-kubernetes-operator-7dc7858566-42g5z 2/2 Running 0
112s{code}
6. Test with a sample FlinkDeployment
{code:java}
➜ flink-kubernetes-operator git:(main) ✗ k create -f examples/basic.yaml
flinkdeployment.flink.apache.org/basic-example created
➜ flink-kubernetes-operator git:(main) ✗ k get
flinkdeployments.flink.apache.org
NAME JOB STATUS LIFECYCLE STATE
basic-example RUNNING STABLE
➜ flink-kubernetes-operator git:(main) ✗ k get po
NAME READY STATUS RESTARTS AGE
basic-example-6c7bff5c68-w669x 1/1 Running 0 70s
basic-example-taskmanager-1-1 1/1 Running 0 23s
flink-kubernetes-operator-7dc7858566-42g5z 2/2 Running 0
3m27s{code}
7. Clean up the FlinkDeployment
{code:java}
➜ flink-kubernetes-operator git:(main) ✗ k delete
flinkdeployments.flink.apache.org basic-example
flinkdeployment.flink.apache.org "basic-example" deleted {code}
8. Force rotate the certificate
{code:java}
➜ flink-kubernetes-operator git:(main) ✗ k get certificate
NAME READY SECRET AGE
flink-operator-serving-cert True webhook-server-cert 4m48s
➜ flink-kubernetes-operator git:(main) ✗ k get certificate
flink-operator-serving-cert -oyaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
annotations:
meta.helm.sh/release-name: flink-kubernetes-operator
meta.helm.sh/release-namespace: default
creationTimestamp: "2025-07-04T09:03:26Z"
generation: 1
labels:
app.kubernetes.io/managed-by: Helm
name: flink-operator-serving-cert
namespace: default
resourceVersion: "997"
uid: b0e1935c-eab8-4b61-ad9f-7bb0bf166c07
spec:
commonName: FlinkDeployment Validator
dnsNames:
- flink-operator-webhook-service.default.svc
- flink-operator-webhook-service.default.svc.cluster.local
issuerRef:
kind: Issuer
name: flink-operator-selfsigned-issuer
keystores:
pkcs12:
create: true
passwordSecretRef:
key: password
name: flink-operator-webhook-secret
secretName: webhook-server-cert
status:
conditions:
- lastTransitionTime: "2025-07-04T09:03:26Z"
message: Certificate is up to date and has not expired
observedGeneration: 1
reason: Ready
status: "True"
type: Ready
notAfter: "2025-10-02T09:03:26Z"
notBefore: "2025-07-04T09:03:26Z"
renewalTime: "2025-09-02T09:03:26Z"
revision: 1
➜ flink-kubernetes-operator git:(main) ✗ cmctl renew
flink-operator-serving-cert
Manually triggered issuance of Certificate default/flink-operator-serving-cert
➜ flink-kubernetes-operator git:(main) ✗ k get certificate
flink-operator-serving-cert -oyaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
annotations:
meta.helm.sh/release-name: flink-kubernetes-operator
meta.helm.sh/release-namespace: default
creationTimestamp: "2025-07-04T09:03:26Z"
generation: 1
labels:
app.kubernetes.io/managed-by: Helm
name: flink-operator-serving-cert
namespace: default
resourceVersion: "1591"
uid: b0e1935c-eab8-4b61-ad9f-7bb0bf166c07
spec:
commonName: FlinkDeployment Validator
dnsNames:
- flink-operator-webhook-service.default.svc
- flink-operator-webhook-service.default.svc.cluster.local
issuerRef:
kind: Issuer
name: flink-operator-selfsigned-issuer
keystores:
pkcs12:
create: true
passwordSecretRef:
key: password
name: flink-operator-webhook-secret
secretName: webhook-server-cert
status:
conditions:
- lastTransitionTime: "2025-07-04T09:03:26Z"
message: Certificate is up to date and has not expired
observedGeneration: 1
reason: Ready
status: "True"
type: Ready
notAfter: "2025-10-02T09:08:37Z"
notBefore: "2025-07-04T09:08:37Z"
renewalTime: "2025-09-02T09:08:37Z"
revision: 2 {code}
9. Verify the operator/webhook are still running
{code:java}
➜ flink-kubernetes-operator git:(main) ✗ k get po
NAME READY STATUS RESTARTS AGE
flink-kubernetes-operator-7dc7858566-42g5z 2/2 Running 0 5m50s
{code}
10. Check logs for the webhook and verify if the certificate was reloaded
{code:java}
➜ flink-kubernetes-operator git:(main) ✗ k logs
flink-kubernetes-operator-7dc7858566-42g5z -c flink-webhook | tail -20
2025-07-04 09:03:57,113 o.a.f.k.o.f.FileSystemWatchService [INFO ] Starting
watching path: /certs
2025-07-04 09:03:57,117 o.a.f.k.o.f.FileSystemWatchService [INFO ] Path is
resolved to real path: /certs
2025-07-04 09:03:57,186 o.a.f.k.o.a.FlinkOperatorWebhook [INFO ] Webhook
listening at 0:0:0:0:0:0:0:0:9443
2025-07-04 09:08:47,807 o.a.f.k.o.a.FlinkOperatorWebhook [INFO ] Reloading SSL
context because of certificate change
2025-07-04 09:08:47,809 o.a.f.k.o.s.ReloadableSslContext [INFO ] Creating
keystore with type: pkcs12
2025-07-04 09:08:47,810 o.a.f.k.o.s.ReloadableSslContext [INFO ] Loading
keystore from file: /certs/keystore.p12
2025-07-04 09:08:47,816 o.a.f.k.o.s.ReloadableSslContext [INFO ] Initializing
key manager with keystore and password
2025-07-04 09:08:47,821 o.a.f.k.o.a.FlinkOperatorWebhook [INFO ] SSL context
reloaded successfully
2025-07-04 09:08:56,977 o.a.f.c.GlobalConfiguration [INFO ] Using legacy
YAML parser to load flink configuration file from
/opt/flink/conf/flink-conf.yaml.
2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration [INFO ] Loading
configuration property: parallelism.default, 1
2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration [INFO ] Loading
configuration property: taskmanager.numberOfTaskSlots, 1
2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration [INFO ] Loading
configuration property:
kubernetes.operator.default-configuration.flink-version.v1_18.env.java.opts.all,
--add-exports=java.base/sun.net.util=ALL-UNNAMED
--add-exports=java.rmi/sun.rmi.registry=ALL-UNNAMED
--add-exports=java.security.jgss/sun.security.krb5=ALL-UNNAMED
--add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.base/java.net=ALL-UNNAMED
--add-opens=java.base/java.io=ALL-UNNAMED
--add-opens=java.base/java.nio=ALL-UNNAMED
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
--add-opens=java.base/java.text=ALL-UNNAMED
--add-opens=java.base/java.time=ALL-UNNAMED
--add-opens=java.base/java.util=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED
2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration [INFO ] Loading
configuration property: kubernetes.operator.reconcile.interval, 15 s
2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration [INFO ] Loading
configuration property:
kubernetes.operator.default-configuration.flink-version.v1_19+.env.java.default-opts.all,
--add-exports=java.base/sun.net.util=ALL-UNNAMED
--add-exports=java.rmi/sun.rmi.registry=ALL-UNNAMED
--add-exports=java.security.jgss/sun.security.krb5=ALL-UNNAMED
--add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.base/java.net=ALL-UNNAMED
--add-opens=java.base/java.io=ALL-UNNAMED
--add-opens=java.base/java.nio=ALL-UNNAMED
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
--add-opens=java.base/java.text=ALL-UNNAMED
--add-opens=java.base/java.time=ALL-UNNAMED
--add-opens=java.base/java.util=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED
2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration [INFO ] Loading
configuration property: kubernetes.operator.metrics.reporter.slf4j.interval, 5
MINUTE
2025-07-04 09:08:56,983 o.a.f.c.GlobalConfiguration [INFO ] Loading
configuration property: kubernetes.operator.observer.progress-check.interval, 5
s
2025-07-04 09:08:56,983 o.a.f.c.GlobalConfiguration [INFO ] Loading
configuration property: kubernetes.operator.health.probe.enabled, true
2025-07-04 09:08:56,983 o.a.f.c.GlobalConfiguration [INFO ] Loading
configuration property: kubernetes.operator.health.probe.port, 8085
2025-07-04 09:08:56,983 o.a.f.c.GlobalConfiguration [INFO ] Loading
configuration property:
kubernetes.operator.metrics.reporter.slf4j.factory.class,
org.apache.flink.metrics.slf4j.Slf4jReporterFactory
2025-07-04 09:08:56,984 o.a.f.k.o.c.FlinkConfigManager [INFO ] Default
configuration did not change, nothing to do... {code}
11. Create a resource to test the webhook
{code:java}
➜ flink-kubernetes-operator git:(main) ✗ k create -f examples/basic.yaml
flinkdeployment.flink.apache.org/basic-example created {code}
12. Check the resource status
{code:java}
➜ flink-kubernetes-operator git:(main) ✗ k get
flinkdeployments.flink.apache.org
NAME JOB STATUS LIFECYCLE STATE
basic-example RUNNING STABLE
➜ flink-kubernetes-operator git:(main) ✗ k get po
NAME READY STATUS RESTARTS AGE
basic-example-6c7bff5c68-gmlh2 1/1 Running 0 25s
basic-example-taskmanager-1-1 1/1 Running 0 14s
flink-kubernetes-operator-7dc7858566-42g5z 2/2 Running 0 7m28s
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)