Hello,

I'm using Akka 2.3.4 running on Java 1.7.0_72 in Ubuntu 14.04

I'm reproducibly falling into the situation when a remote system is 
quarantined and for each unsucessful attempt to connect to that remote 
system a new socket is allocated but never disposed. Eventually this leads 
to reaching the operating system limits on the number of open sockets (and 
files --> Too many open files in the system).

Our setup is the following:
There is one "hub" actor system (commLayer below) which is listening on 
default akka port 2552
There are three client actor systems which connect to 2552. They are 
configured to automatically find an empty port (akka.remote.netty.tcp.port 
= 0)

To fall into the described situation I do the following.
1. Start the system normally. All actor systems are started and operating 
the way they should
2. Suspend all thread of the system (I do it by putting the breakpoint 
which is configured to suspend all threads)
3. Wait for some period of time (e.g. 30 sec)
4. Remove the breakpoint and continue all the threads

Immediately after this I get similar logs for all three client systems.

2014-11-17 13:45:30,765 [commClient-akka.actor.default-dispatcher-15] WARN  
Remoting : Tried to associate with unreachable remote address 
[akka.tcp://commLayer@localhost:2552]. Address is now gated for 15000 ms, 
all messages to this address will be delivered to dead letters. Reason: The 
remote system has a UID that has been quarantined. Association aborted.
...
2014-11-17 13:45:50,769 [commClient-akka.actor.default-dispatcher-20] WARN  
Remoting : Tried to associate with unreachable remote address 
[akka.tcp://commLayer@localhost:2552]. Address is now gated for 15000 ms, 
all messages to this address will be delivered to dead letters. Reason: The 
remote system has quarantined this system. No further associations to the 
remote system are possible until this system is restarted.


So the hub system has quarantined the clients and the clients have 
quarantined the hub.
In current implementation in our system when the hub becomes unreachable 
the clients will try to reconnect to it with a certain rate (every 10 
seconds). It usually succeeds but of course it's pointless in case of 
quarantined systems and we have yet to handle this scenario. However what 
is strange is that  when the reconnection fails with the messages mentioned 
above a new socket is opened and netstat returns an ESTABLISHED state for 
each of them.

Periodic checks on netstat show that every 10-15 seconds 3 new sockets are 
opened to connect to 2552 and they are never closed reaching eventually the 
OS limit. It looks like a resource leak to me.
Is this kind of behavior expected for the quarantined systems?

Additional information: Initially we detected the issue on a dev Mac 
machine. If our system was left running and Mac went to sleep mode - after 
waking up the described behavior was observed. 

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to