[
https://issues.apache.org/jira/browse/FLINK-28171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557178#comment-17557178
]
Yang Wang commented on FLINK-28171:
-----------------------------------
I second with [~martijnvisser]'s suggestion that we should consider
{{appProtocal}} first and make sure it does not break for old K8s versions.
> Adjust Job and Task manager port definitions to work with Istio+mTLS
> --------------------------------------------------------------------
>
> Key: FLINK-28171
> URL: https://issues.apache.org/jira/browse/FLINK-28171
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / Kubernetes
> Affects Versions: 1.14.4
> Environment: flink-kubernetes-operator 1.0.0
> Flink 1.14-java11
> Kubernetes v1.19.5
> Istio 1.7.6
> Reporter: Moshe Elisha
> Priority: Major
>
> Hello,
>
> We are launching Flink deployments using the [Flink Kubernetes
> Operator|https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-stable/]
> on a Kubernetes cluster with Istio and mTLS enabled.
>
> We found that the TaskManager is unable to communicate with the JobManager on
> the jobmanager-rpc port:
>
> {{2022-06-15 15:25:40,508 WARN akka.remote.ReliableDeliverySupervisor
> [] - Association with remote system
> [akka.tcp://[[email protected]|mailto:[email protected]]:6123]
> has failed, address is now gated for [50] ms. Reason: [Association failed
> with
> [akka.tcp://[[email protected]|mailto:[email protected]]:6123]]
> Caused by: [The remote system explicitly disassociated (reason unknown).]}}
>
> The reason for the issue is that the JobManager service port definitions are
> not following the Istio guidelines
> [https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/]
> (see example below).
>
> There was also an email discussion around this topic in the users mailing
> group under the subject "Flink Kubernetes Operator with K8S + Istio + mTLS -
> port definitions".
> With the help of the community, we were able to work around the issue but it
> was very hard and forced us to skip Istio proxy which is not ideal.
>
> We would like you to consider changing the default port definitions, either
> # Rename the ports – I understand it is Istio specific guideline but maybe
> it is better to at least be aligned with one (popular) vendor guideline
> instead of none at all.
> # Add the “appProtocol” property[1] that is not specific to any vendor but
> requires Kubernetes >= 1.19 where it was introduced as beta and moved to
> stable in >= 1.20. The option to add appProtocol property was added only in
> [https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0] with
> [#3570|https://github.com/fabric8io/kubernetes-client/issues/3570].
> # Or allow a way to override the defaults.
>
> [https://kubernetes.io/docs/concepts/services-networking/_print/#application-protocol]
>
>
> {{# k get service inference-results-to-analytics-engine -o yaml}}
> {{apiVersion: v1}}
> {{kind: Service}}
> {{...}}
> {{spec:}}
> {{ clusterIP: None}}
> {{ ports:}}
> {{ - name: jobmanager-rpc *# should start with “tcp-“ or add "appProtocol"
> property*}}
> {{ port: 6123}}
> {{ protocol: TCP}}
> {{ targetPort: 6123}}
> {{ - name: blobserver *# should start with "tcp-" or add "appProtocol"
> property*}}
> {{ port: 6124}}
> {{ protocol: TCP}}
> {{ targetPort: 6124}}
> {{...}}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)