[
https://issues.apache.org/jira/browse/FLINK-29117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17608261#comment-17608261
]
Gyula Fora commented on FLINK-29117:
------------------------------------
I have never hit or seen this issue with the operator, are you still
experiencing it?
> Tried to associate with unreachable remote resourcemanager address
> ------------------------------------------------------------------
>
> Key: FLINK-29117
> URL: https://issues.apache.org/jira/browse/FLINK-29117
> Project: Flink
> Issue Type: Bug
> Components: Deployment / Kubernetes, flink-contrib, flink-docker,
> Kubernetes Operator
> Affects Versions: 1.15.1, kubernetes-operator-1.1.0
> Reporter: geonyeong kim
> Priority: Critical
> Attachments: taskmanager_log.png
>
>
> Hello.
> I am planning to distribute and use FlinkDeployment through the flink
> kubernetes operator.
> CRD, operator, webbook, etc. are all set up, and we actually distributed
> FlinkDeployment to confirm normal operation.
> *However, strangely, connecting to resource manager fails if you make more
> than one task manager pod replica.*
> I thought it might be a problem with akka, timeout, etc. so I increased the
> values as below
> The connection continues to fail.
> - akka.retry-gate-closed-for: 10000
> - akka.server-socket-worker-pool.pool-size-min: 6
> - akka.server-socket-worker-pool.pool-size-max: 10
> - akka.client-socket-worker-pool.pool-size-max: 10
> - akka.client-socket-worker-pool.pool-size-min: 6
> - blob.client.connect.timeout: 30000
> The log of the taskmanager is as follows.
>
> {code:java}
> Association with remote system [akka.tcp://[email protected]:6123] has
> failed, address is now gated for [10000] ms. Reason: [Disassociated]
> Could not resolve ResourceManager address
> akka.tcp://[email protected]:6123/user/rpc/resourcemanager_1, retrying in
> 10000 ms: Could not connect to rpc endpoint under address
> akka.tcp://[email protected]:6123/user/rpc/resourcemanager_1.
> Tried to associate with unreachable remote address
> [akka.tcp://[email protected]:6123]. Address is now gated for 10000 ms, all
> messages to this address will be delivered to dead letters. Reason: [The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.] {code}
> *If you go into the task manager pod and tcp check, the connection is open.*
> *Below are the flink versions I used.*
> * flink image: 1.15.1
> - flink kubernetes operator: 1.1.0
>
> *I would appreciate it if you could check the problem quickly.*
> *If it's a bug, please tell me how to detour in the current situation.*
--
This message was sent by Atlassian Jira
(v8.20.10#820010)