[
https://issues.apache.org/jira/browse/FLINK-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16017368#comment-16017368
]
Stephan Ewen commented on FLINK-5685:
-------------------------------------
The restarting of actor systems was necessary to prevent having blocked off
TaskManagers.
Does an additional TCP connection come when an actor system is restarted, or
also in other cases? To get to 210936 connections, you would need a lot of
restarts...
I am wondering why akka would not close these connections.
> Connection leak in Taskmanager
> ------------------------------
>
> Key: FLINK-5685
> URL: https://issues.apache.org/jira/browse/FLINK-5685
> Project: Flink
> Issue Type: Bug
> Components: Distributed Coordination
> Reporter: Andrey
> Priority: Critical
>
> Steps to reproduce:
> * setup cluster with the following configuration: 1 job manager, 2 task
> managers
> * job manager starts rejecting connection attempts from task manager.
> {code}
> 2017-01-30 03:24:42,908 INFO
> org.apache.flink.runtime.taskmanager.TaskManager - Trying to
> register at JobManager akka.tcp://flink@ip:6123/user/jobmanager (attempt
> 4326, timeout: 30 seconds)
> 2017-01-30 03:24:42,913 WARN Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@ip:6123]. Address is now gated for 5000 ms, all messages to
> this
> address will be delivered to dead letters. Reason: The remote system has
> quarantined this system. No further associations to the remote system are
> possible until this system is restarted.
> {code}
> * task manager tries multiple times. (looks like it doens't close connection
> after failure)
> * job manager unable to process any messages. In logs:
> {code}
> 2017-01-30 03:25:12,932 WARN
> org.jboss.netty.channel.socket.nio.AbstractNioSelector - Failed to
> accept a connection.
> java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> at
> org.jboss.netty.channel.socket.nio.NioServerBoss.process(NioServerBoss.java:100)
> at
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
> at
> org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)