Re: [akka-user] is quarantining really necessary

Roland Kuhn Fri, 28 Feb 2014 02:28:22 -0800

28 feb 2014 kl. 02:40 skrev Jim Newsham <[email protected]>:

> 
> Hi Endre thanks for your reply, I appreciate it.  After thinking about it 
> carefully, it makes complete sense.
> 
> I had a longer response and follow-up question, but it was summarily deleted 
> (by a moderator?).  I'm assuming that was in error...


It might not always seem like it, but akka-user is manually moderated (to 
counter the spam-bots; AFAIK we never rejected a mail from a real human being). 
Sometimes it takes a few minutes for a post to be approved, and if you are 
really unlucky we might all sleep at the same time.

Regards,

Roland

> 
> Thanks again,
> Jim
> 
> 
> On Tuesday, February 25, 2014 2:02:09 AM UTC-10, Akka Team wrote:
> Hi Jim,
> 
> 
> On Tue, Feb 25, 2014 at 12:02 AM, Jim Newsham <[email protected]> wrote:
> 
> Hi everyone,
> 
> I just became aware of how Akka can establish a quarantine between remote 
> actor systems, requiring a restart of one of the actor systems.  I know this 
> has already been discussed in several threads in this forum, as I've been 
> searching and reading anything relevant to get an understanding of the 
> issues.  
> 
> I apologize for beating a dead horse, but something feels very wrong to me 
> about this approach.  It seems very heavy-handed to have a commonly occurring 
> condition where an entire actor system must be restarted.
> 
> To answer in short, the fallacy in the above sentence is to state that 
> quarantining happens in commonly occurring conditions. It happens only on two 
> kinds of occasions:
>  - Irrecoverable conditions due to system messages being undeliverable for a 
> very long time (and I mean really long), or system message sequence numbers 
> are in an inconsistent state on the two systems.
>  - An irreversible decision has been made by declaring the other system dead 
> (hence the name DeathWatch and not TemporalUnreachabilityWatch and cluster 
> node Down instead of cluster node Unreachable)
>  
>  And it seems contrary to Akka's otherwise fine-grained error handling and 
> recovery support which allows for individual actors to fail and be restarted. 
>  
> 
> Akka recovers lost connections on the remoting level, and recovers cluster 
> node unreachability on the cluster level (main feature of 2.3). Quarantining 
> triggers on stronger conditions than those above.
> 
> Also, actors that have sent a Terminated are definitely stopped. Restart does 
> not send Terminated.
>  
> 
> Is quarantining the only reasonable approach?  I understand that after a 
> remote actor system is declared unreachable
> 
> Not unreachable, this is the point where the argument is invalid. The remote 
> system is considered dead. That is an irreversible decision. If you want to 
> track temporary unreachability you have to either implement heartbeating (if 
> you use pure remoting) or just use the cluster unreachability tracking 
> facility.
> 
> I hope this clarifies the decision. Also, you might want to take a look at 
> this: 
> http://doc.akka.io/docs/akka/2.3.0-RC4/scala/remoting.html#Lifecycle_and_Failure_Recovery_Model
> 
> -Endre
>  
> and Terminated messages are dispatched for remotely watched actors, that we 
> don't want to receive any more messages from the actors which have been 
> declared Terminated.  
> 
> As an alternative idea, what if you included an identifier for the current 
> remote association as part of a remote actor's identifier?  In this case, 
> when a new remote association is established, all actors on the remote side 
> are considered distinct from any actors the local side may have communicated 
> with in the past.  This guarantees we don't violate the "no messages after 
> Terminated" rule, and allows the distributed system to continue operating 
> without a restart.  Individual actors can look up and re-establish 
> communication with their remote peers, achieving fine-grained fault tolerance.
> 
> Best regards,
> Jim
> 
> 
> -- 
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ: http://akka.io/faq/
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> --- 
> You received this message because you are subscribed to the Google Groups 
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/groups/opt_out.
> 
> 
> 
> -- 
> Akka Team
> Typesafe - The software stack for applications that scale
> Blog: letitcrash.com
> Twitter: @akkateam
> 
> -- 
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ: http://akka.io/faq/
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> --- 
> You received this message because you are subscribed to the Google Groups 
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/groups/opt_out.



Dr. Roland Kuhn
Akka Tech Lead
Typesafe – Reactive apps on the JVM.
twitter: @rolandkuhn


-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: http://akka.io/faq/
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [akka-user] is quarantining really necessary

Reply via email to