[
https://issues.apache.org/jira/browse/FLINK-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16017385#comment-16017385
]
Gustavo Anatoly commented on FLINK-5685:
----------------------------------------
bq. I am wondering why akka would not close these connections.
Maybe doesn't close these connections, because too many open files cause an
overload unable to close any file including TCP connections. For this reason I
suggested to configure {{fs.file-max}}
> Connection leak in Taskmanager
> ------------------------------
>
> Key: FLINK-5685
> URL: https://issues.apache.org/jira/browse/FLINK-5685
> Project: Flink
> Issue Type: Bug
> Components: Distributed Coordination
> Reporter: Andrey
> Priority: Critical
>
> Steps to reproduce:
> * setup cluster with the following configuration: 1 job manager, 2 task
> managers
> * job manager starts rejecting connection attempts from task manager.
> {code}
> 2017-01-30 03:24:42,908 INFO
> org.apache.flink.runtime.taskmanager.TaskManager - Trying to
> register at JobManager akka.tcp://flink@ip:6123/user/jobmanager (attempt
> 4326, timeout: 30 seconds)
> 2017-01-30 03:24:42,913 WARN Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@ip:6123]. Address is now gated for 5000 ms, all messages to
> this
> address will be delivered to dead letters. Reason: The remote system has
> quarantined this system. No further associations to the remote system are
> possible until this system is restarted.
> {code}
> * task manager tries multiple times. (looks like it doens't close connection
> after failure)
> * job manager unable to process any messages. In logs:
> {code}
> 2017-01-30 03:25:12,932 WARN
> org.jboss.netty.channel.socket.nio.AbstractNioSelector - Failed to
> accept a connection.
> java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> at
> org.jboss.netty.channel.socket.nio.NioServerBoss.process(NioServerBoss.java:100)
> at
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
> at
> org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)