Hi Alistair,
On Thu, Jan 16, 2014 at 9:30 AM, Alistair George <[email protected]>wrote: > If I set up a watch on a remote actor (one on a remote actor system) and > the network between me and the remote system fails, I get a Terminated > message almost immediately. In fact, the remote actor hasn't terminated, > That does not matter. If you use remote DeathWatch, and one of the systems gets unreachable for enough time it will eventually fire Terminated for all the watched actors on the remote system and then quarantines that system so it never comes back again. The deathwatch failure detector (akka.remote.watch-failure-detector) settings controls how sensitive is this decision. If you think that a 1 hour unreachability should be not considered terminal, then you should configure those settings correspondingly. > and I can still use the ActorRef to send messages to it once comms are > restored. (However, if comms fail a second time I don't get a second > Terminated message.) > This is because we made the mistakes in 2.2.x: - we made quarantine times configurable - we set it to a low value, 60 seconds After the quarantine elapses the systems can communicate again, regardless of the Terminated message, probably this is what you observed -- and this is exactly why quarantine in 2.3 is permanent. > > "Terminated" and "lost contact" are rather different states, and may need > different handling. Does anyone know of a reliable way I can distinguish > these? > DeathWatch sends Terminated in the case the remote system is in "lost contact" state for a long time. How long is that time is configurable by the DeathWatch failure detector. "lost contact" events are generated as remote lifecycle events, but I don't recommend using those directly. Message send supposed to be lossy, you can track reachability in your user layer by some heartbeating mechanism if you want it. Btw, there is another failure detector (akka.remote.transport-failure-detector) that monitors the health of network connections, but it does not generate Terminated events, only reconnect attempts. In 2.3 clustering will differentiate between UNREACHABLE events (which can heal) from removals. You probably want to use those features instead of plain remoting. > > Thanks > > Alistair > > -- > >>>>>>>>>> Read the docs: http://akka.io/docs/ > >>>>>>>>>> Check the FAQ: http://akka.io/faq/ > >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user > --- > You received this message because you are subscribed to the Google Groups > "Akka User List" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/akka-user. > For more options, visit https://groups.google.com/groups/opt_out. > -- Akka Team Typesafe - The software stack for applications that scale Blog: letitcrash.com Twitter: @akkateam -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: http://akka.io/faq/ >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/groups/opt_out.
