Hi - I'm being bitten by a client/server application where multiple clients subscribe for updates from a server where, upon a long GC pause, the clients are being quarantined. Here is some client logs for an attempt to rediscover the server actor using "server ? Identify", which times out. I can see that this is because the client has quarantined the server.
24-Oct-2014 11:30:28:310: [(akka)Remoting - WARNING] [31]: Tried to associate with unreachable remote address [akka.tcp://[email protected]:35411]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted. I can guess (it's not a guess, the times line up perfectly) in the server when the original disconnect happened: 1.611: [Full GC (Metadata GC Threshold) 87M->14M(52M), 0.1731620 secs] 7.622: [Full GC (Metadata GC Threshold) 175M->85M(220M), 0.4527592 secs] 8658.035: [Full GC (Metadata GC Threshold) 3638M->419M(4192M), 1.8916884 secs] 244257.679: [Full GC (Allocation Failure) 6516M->2622M(14G), 9.2884735 secs] *391857.856: [Full GC (Allocation Failure) 7390M->3758M(12G), 13.7533193 secs]* the server's akka logs state the following happendd at this time (the server quarantines my client): [WARN] [10/24/2014 10:57:22.416] [gekkoRemoting-akka.remote.default-remote-dispatcher-7] [akka.tcp://[email protected]:35411/system/remote-watcher] Detected unreachable: [akka.tcp://[email protected]:60091] [WARN] [10/24/2014 10:57:22.416] [gekkoRemoting-akka.remote.default-remote-dispatcher-11015] [Remoting] Association to [akka.tcp://[email protected]:60091] having UID [1196983173] is irrecoverably failed. UID is now quarantined and all messages to this UID will be delivered to dead letters. Remote actorsystem must be restarted to recover from this situation. My question is "how on earth do I code against this?" - currently I have a "canary" which creates the client connection. The client connection picks up the Terminated message from the server and stops itself; the canary picks this Termination up and spools up (after a pause for a few minutes) a new client to attempt to connect to the server. Except the new client cannot connect to the server because it has been quarantined. How is my client supposed to know this? There's no "you've been quarantined" callback, just a timeout looking up a server. Do I need to just assume that the failure to lookup a server might indicate a quarantine? here's the server's remote configuration: remote { log-remote-lifecycle-events = on retry-gate-closed-for = 5 s enabled-transports = ["akka.remote.netty.tcp"] netty.tcp { maximum-frame-size = 100 MiB } watch-failure-detector { acceptable-heartbeat-pause = 20 s heartbeat-interval = 5 s } transport-failure-detector { acceptable-heartbeat-pause = 10 s heartbeat-interval = 3 s } } here's the client's remote configuration: remote { log-remote-lifecycle-events = on gate-invalid-addresses-for = 5 s enabled-transports = ["akka.remote.netty.tcp"] netty.tcp { port = 0 maximum-frame-size = 100 MiB } watch-failure-detector { acceptable-heartbeat-pause = 20 s heartbeat-interval = 5 s } transport-failure-detector { acceptable-heartbeat-pause = 12 s heartbeat-interval = 3 s } } Chris -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
