Hello, I'm using Akka 2.3.4 running on Java 1.7.0_72 in Ubuntu 14.04
I'm reproducibly falling into the situation when a remote system is quarantined and for each unsucessful attempt to connect to that remote system a new socket is allocated but never disposed. Eventually this leads to reaching the operating system limits on the number of open sockets (and files --> Too many open files in the system). Our setup is the following: There is one "hub" actor system (commLayer below) which is listening on default akka port 2552 There are three client actor systems which connect to 2552. They are configured to automatically find an empty port (akka.remote.netty.tcp.port = 0) To fall into the described situation I do the following. 1. Start the system normally. All actor systems are started and operating the way they should 2. Suspend all thread of the system (I do it by putting the breakpoint which is configured to suspend all threads) 3. Wait for some period of time (e.g. 30 sec) 4. Remove the breakpoint and continue all the threads Immediately after this I get similar logs for all three client systems. 2014-11-17 13:45:30,765 [commClient-akka.actor.default-dispatcher-15] WARN Remoting : Tried to associate with unreachable remote address [akka.tcp://commLayer@localhost:2552]. Address is now gated for 15000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has a UID that has been quarantined. Association aborted. ... 2014-11-17 13:45:50,769 [commClient-akka.actor.default-dispatcher-20] WARN Remoting : Tried to associate with unreachable remote address [akka.tcp://commLayer@localhost:2552]. Address is now gated for 15000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted. So the hub system has quarantined the clients and the clients have quarantined the hub. In current implementation in our system when the hub becomes unreachable the clients will try to reconnect to it with a certain rate (every 10 seconds). It usually succeeds but of course it's pointless in case of quarantined systems and we have yet to handle this scenario. However what is strange is that when the reconnection fails with the messages mentioned above a new socket is opened and netstat returns an ESTABLISHED state for each of them. Periodic checks on netstat show that every 10-15 seconds 3 new sockets are opened to connect to 2552 and they are never closed reaching eventually the OS limit. It looks like a resource leak to me. Is this kind of behavior expected for the quarantined systems? Additional information: Initially we detected the issue on a dev Mac machine. If our system was left running and Mac went to sleep mode - after waking up the described behavior was observed. -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
