Thanks Endre for your response. It's not the response I was hoping for (at least initially), and I had to think about it for quite a while, but I've come to understand and accept it. It makes complete sense, and I appreciate your explanation.
So what we need in our application is a way to watch reachability in the same way as Akka's built-in death watch support, using heartbeating as you suggest. The events would be Unreachable, Reachable, and Terminated. I've thought about it for a while and I believe I can implement something that will suit our needs without too much trouble (in "user space"; on top of Akka), although it won't be quite as convenient as Akka's death watch. Which causes me to wonder, since this seems to be a generally useful feature that many Akka users could benefit from having, has the Akka team ever considered building support for this into Akka? My plan is to implement this in a lightweight way so that only one heartbeat exists for each remote actor system pair, similarly to how I understand Akka deathwatch works. For this, I'll need to instanceof the ActorRef and cast it to RemoteActorRef, so that I can discover what actor system we're dealing with, and access the "singleton" heartbeat for that system. My assumption is that anything other than RemoteActorRef should be treated locally -- is this a good assumption? I understand that RemoteActorRef is an internal API, and I understand the implications of this (unsupported, unstable API). As far as accessiblity, it's "private[akka]", but javap shows me that it's still "public" from Java. Regards, Jim On Tuesday, February 25, 2014 2:02:09 AM UTC-10, Akka Team wrote: > > Hi Jim, > > > On Tue, Feb 25, 2014 at 12:02 AM, Jim Newsham <[email protected]<javascript:> > > wrote: > >> >> Hi everyone, >> >> I just became aware of how Akka can establish a quarantine between remote >> actor systems, requiring a restart of one of the actor systems. I know >> this has already been discussed in several threads in this forum, as I've >> been searching and reading anything relevant to get an understanding of the >> issues. >> >> I apologize for beating a dead horse, but something feels very wrong to >> me about this approach. It seems very heavy-handed to have a commonly >> occurring condition where an entire actor system must be restarted. >> > > To answer in short, the fallacy in the above sentence is to state that > quarantining happens in commonly occurring conditions. It happens only on > two kinds of occasions: > - Irrecoverable conditions due to system messages being undeliverable for > a very long time (and I mean really long), or system message sequence > numbers are in an inconsistent state on the two systems. > - An irreversible decision has been made by declaring the other system > dead (hence the name DeathWatch and not TemporalUnreachabilityWatch and > cluster node Down instead of cluster node Unreachable) > > >> And it seems contrary to Akka's otherwise fine-grained error handling >> and recovery support which allows for individual actors to fail and be >> restarted. >> > > Akka recovers lost connections on the remoting level, and recovers cluster > node unreachability on the cluster level (main feature of 2.3). > Quarantining triggers on stronger conditions than those above. > > Also, actors that have sent a Terminated are definitely stopped. Restart > does not send Terminated. > > >> >> Is quarantining the only reasonable approach? I understand that after a >> remote actor system is declared unreachable >> > > Not unreachable, this is the point where the argument is invalid. The > remote system is considered dead. That is an irreversible decision. If you > want to track temporary unreachability you have to either implement > heartbeating (if you use pure remoting) or just use the cluster > unreachability tracking facility. > > I hope this clarifies the decision. Also, you might want to take a look at > this: > http://doc.akka.io/docs/akka/2.3.0-RC4/scala/remoting.html#Lifecycle_and_Failure_Recovery_Model > > -Endre > > >> and Terminated messages are dispatched for remotely watched actors, that >> we don't want to receive any more messages from the actors which have been >> declared Terminated. >> >> As an alternative idea, what if you included an identifier for the >> current remote association as part of a remote actor's identifier? In this >> case, when a new remote association is established, all actors on the >> remote side are considered distinct from any actors the local side may have >> communicated with in the past. This guarantees we don't violate the "no >> messages after Terminated" rule, and allows the distributed system to >> continue operating without a restart. Individual actors can look up and >> re-establish communication with their remote peers, achieving fine-grained >> fault tolerance. >> >> Best regards, >> Jim >> >> -- >> >>>>>>>>>> Read the docs: http://akka.io/docs/ >> >>>>>>>>>> Check the FAQ: http://akka.io/faq/ >> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user >> --- >> You received this message because you are subscribed to the Google Groups >> "Akka User List" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected]<javascript:> >> . >> Visit this group at http://groups.google.com/group/akka-user. >> For more options, visit https://groups.google.com/groups/opt_out. >> > > > > -- > Akka Team > Typesafe - The software stack for applications that scale > Blog: letitcrash.com > Twitter: @akkateam > -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: http://akka.io/faq/ >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/groups/opt_out.
