[ 
https://issues.apache.org/jira/browse/FLINK-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eron Wright  updated FLINK-8311:
--------------------------------
    Description: 
There is a need for better documentation on what connects to what over which 
ports in a Flink cluster to allow users to configure network access control 
rules.

E.g. I was under the impression that in a ZK HA configuration the Job Managers 
were essentially independent and only coordinated via ZK.  But starting 
multiple JMs in HA with the JM RPC port blocked between JMs shows that the 
second JM's Akka subsystem is trying to connect to the leading JM:

{code}
INFO  akka.remote.transport.ProtocolStateActor                      - No 
response from remote for outbound association. Associate timed out after [20000 
ms].
WARN  akka.remote.ReliableDeliverySupervisor                        - 
Association with remote system [akka.tcp://flink@10.210.210.127:6123] has 
failed, address is now gated for [5000] ms. Reason: [Association failed with 
[akka.tcp://flink@10.210.210.127:6123]] Caused by: [No response from remote for 
outbound association. Associate timed out after [20000 ms].]
WARN  akka.remote.transport.netty.NettyTransport                    - Remote 
connection to [null] failed with 
org.apache.flink.shaded.akka.org.jboss.netty.channel.ConnectTimeoutException: 
connection timed out: /10.210.210.127:6123
{code}

  was:
There is a need for better documentation on what connects to what over which 
ports in a Flink cluster to allow users to configure network access control 
rules.

E.g. I was under the impression that in a ZK HA configuration the Job Managers 
were essentially independent and only coordinated via ZK.  But starting 
multiple JMs in HA with the JM RPC port blocked between JMs shows that the 
second JM's Akka subsystem is trying to connect to the leading JM:

INFO  akka.remote.transport.ProtocolStateActor                      - No 
response from remote for outbound association. Associate timed out after [20000 
ms].
WARN  akka.remote.ReliableDeliverySupervisor                        - 
Association with remote system [akka.tcp://flink@10.210.210.127:6123] has 
failed, address is now gated for [5000] ms. Reason: [Association failed with 
[akka.tcp://flink@10.210.210.127:6123]] Caused by: [No response from remote for 
outbound association. Associate timed out after [20000 ms].]
WARN  akka.remote.transport.netty.NettyTransport                    - Remote 
connection to [null] failed with 
org.apache.flink.shaded.akka.org.jboss.netty.channel.ConnectTimeoutException: 
connection timed out: /10.210.210.127:6123


> Flink needs documentation for network access control
> ----------------------------------------------------
>
>                 Key: FLINK-8311
>                 URL: https://issues.apache.org/jira/browse/FLINK-8311
>             Project: Flink
>          Issue Type: Improvement
>          Components: Documentation
>    Affects Versions: 1.4.0
>            Reporter: Elias Levy
>
> There is a need for better documentation on what connects to what over which 
> ports in a Flink cluster to allow users to configure network access control 
> rules.
> E.g. I was under the impression that in a ZK HA configuration the Job 
> Managers were essentially independent and only coordinated via ZK.  But 
> starting multiple JMs in HA with the JM RPC port blocked between JMs shows 
> that the second JM's Akka subsystem is trying to connect to the leading JM:
> {code}
> INFO  akka.remote.transport.ProtocolStateActor                      - No 
> response from remote for outbound association. Associate timed out after 
> [20000 ms].
> WARN  akka.remote.ReliableDeliverySupervisor                        - 
> Association with remote system [akka.tcp://flink@10.210.210.127:6123] has 
> failed, address is now gated for [5000] ms. Reason: [Association failed with 
> [akka.tcp://flink@10.210.210.127:6123]] Caused by: [No response from remote 
> for outbound association. Associate timed out after [20000 ms].]
> WARN  akka.remote.transport.netty.NettyTransport                    - Remote 
> connection to [null] failed with 
> org.apache.flink.shaded.akka.org.jboss.netty.channel.ConnectTimeoutException: 
> connection timed out: /10.210.210.127:6123
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to