[
https://issues.apache.org/jira/browse/FLINK-31775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergio Sainz updated FLINK-31775:
---------------------------------
Description:
When using native kubernetes deployment mode, and when new TaskManager pod is
started to process a job, the TaskManager pod will attempt to register itself
to the resource manager (JobManager). the TaskManager looks up the resource
manager per ip-address
(akka.tcp://[email protected]:6123/user/rpc/resourcemanager_1)
Nevertheless when istio is enabled, the resolution by ip address is blocked,
and hence we see that the job cannot start because task manager cannot register
with the resource manager:
2023-04-10 23:24:19,752 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not
resolve ResourceManager address
akka.tcp://[email protected]:6123/user/rpc/resourcemanager_1, retrying in
10000 ms: Could not connect to rpc endpoint under address
akka.tcp://[email protected]:6123/user/rpc/resourcemanager_1.
Notice that when HA is disabled, the resolution of the resource manager is made
by service name and so the resource manager can be found
2023-04-11 00:49:34,162 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Successful
registration at resource manager
akka.tcp://[email protected]:6123/user/rpc/resourcemanager_*
under registration id 83ad942597f86aa880ee96f1c2b8b923.
Notice in my case , it is not possible to disable istio as explained here:
[https://doc.akka.io/docs/akka-management/current/bootstrap/istio.html]
Although similar to https://issues.apache.org/jira/browse/FLINK-28171 , logging
as separate defect as I believe the fix of FLINK-28171 won't fix this case.
FLINK-28171 is about Flink Kubernetes Operator.
was:
When using native kubernetes deployment mode, and when new TaskManager is
started to process a job, the TaskManager will attempt to register itself to
the resource manager (job manager). the TaskManager looks up the resource
manager per ip-address
(akka.tcp://[email protected]:6123/user/rpc/resourcemanager_1)
Nevertheless when istio is enabled, the resolution by ip address is blocked,
and hence we see that the job cannot start because task manager cannot register
with the resource manager:
2023-04-10 23:24:19,752 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not
resolve ResourceManager address
akka.tcp://[email protected]:6123/user/rpc/resourcemanager_1, retrying in
10000 ms: Could not connect to rpc endpoint under address
akka.tcp://[email protected]:6123/user/rpc/resourcemanager_1.
Notice that when HA is disabled, the resolution of the resource manager is made
by service name and so the resource manager can be found
2023-04-11 00:49:34,162 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Successful
registration at resource manager
akka.tcp://[email protected]:6123/user/rpc/resourcemanager_*
under registration id 83ad942597f86aa880ee96f1c2b8b923.
Notice it is not possible to disable istio (as explained here :
https://doc.akka.io/docs/akka-management/current/bootstrap/istio.html)
Although similar to https://issues.apache.org/jira/browse/FLINK-28171 , logging
as separate defect as I believe the fix of FLINK-28171 won't fix this case.
FLINK-28171 is about Flink Kubernetes Operator.
> High-Availability not supported in kubernetes when istio enabled
> ----------------------------------------------------------------
>
> Key: FLINK-31775
> URL: https://issues.apache.org/jira/browse/FLINK-31775
> Project: Flink
> Issue Type: Bug
> Components: Deployment / Kubernetes
> Affects Versions: 1.16.1
> Reporter: Sergio Sainz
> Priority: Major
>
> When using native kubernetes deployment mode, and when new TaskManager pod is
> started to process a job, the TaskManager pod will attempt to register itself
> to the resource manager (JobManager). the TaskManager looks up the resource
> manager per ip-address
> (akka.tcp://[email protected]:6123/user/rpc/resourcemanager_1)
>
> Nevertheless when istio is enabled, the resolution by ip address is blocked,
> and hence we see that the job cannot start because task manager cannot
> register with the resource manager:
> 2023-04-10 23:24:19,752 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not
> resolve ResourceManager address
> akka.tcp://[email protected]:6123/user/rpc/resourcemanager_1, retrying in
> 10000 ms: Could not connect to rpc endpoint under address
> akka.tcp://[email protected]:6123/user/rpc/resourcemanager_1.
>
> Notice that when HA is disabled, the resolution of the resource manager is
> made by service name and so the resource manager can be found
>
> 2023-04-11 00:49:34,162 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Successful
> registration at resource manager
> akka.tcp://[email protected]:6123/user/rpc/resourcemanager_*
> under registration id 83ad942597f86aa880ee96f1c2b8b923.
>
> Notice in my case , it is not possible to disable istio as explained here:
> [https://doc.akka.io/docs/akka-management/current/bootstrap/istio.html]
>
> Although similar to https://issues.apache.org/jira/browse/FLINK-28171 ,
> logging as separate defect as I believe the fix of FLINK-28171 won't fix this
> case. FLINK-28171 is about Flink Kubernetes Operator.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)