On Tuesday, January 21, 2014 8:44:17 AM UTC, Akka Team wrote:
>
> Hi Alistair,
>
>
> On Tue, Jan 21, 2014 at 8:38 AM, Alistair George 
> <[email protected]<javascript:>
> > wrote:
>
>> Hi Akka,
>>
>> Thanks for the reply. One question: if (in 2.3) a remote actor system 
>> becomes permanently quarantined, what do I have to do to re-establish 
>> communication once the comms problem is fixed? 
>>
>
> First of all, quarantining is a state where it is not considered just a 
> communications problem but the remote system is declared dead (it is 
> declared, not proven since all we know that it does not reply). Short 
> communication failures do not trigger quarantine (what is considered short 
> is configurable).
>  
>
>> Do I have to restart the remote actor system? Or the local one? Or both?
>>
>
> From the remoting viewpoint it does not matter which one you restart. 
> Obviously if one of the systems genuinely crashed then that is the one to 
> be restarted, otherwise it is application specific.
>
> I'm not sure this is desirable behaviour. I shouldn't have to restart a 
process just to recover from a comms failure. After all, nothing in the 
process has failed, and it may be providing services to other clients that 
have not suffered any comms failure. They shouldn't have to take the impact 
of a restart.

One of the strengths of Akka is that it doesn't pretend to do things that 
can't be done in a distributed context - this is essential for transparent 
distribution. One of this things you can't do distributed is give reliable, 
timely notification of a remote event, such as actor termination, and I 
don't think Akka should try.

What I'd prefer is this:

   - Reconnect attempts should continue indefinitely. 
   - The DeathWatch protocol should be extended to include (possibly 
   multiple) Reachable/Unreachable events. 
   - Terminate should only be delivered when the remote actor system is 
   reachable and asserts that the watched actor does not exist. This might 
   never happen: an actor might stay in an unreachable state forever.

I realise I can emulate this by setting the timeout before quarantine to be 
effectively infinite, and adding my own facility to detect reachability and 
termination, but this isn't trivial. I'd prefer this behaviour to be 
available out of the box, for both practical and conceptual reasons.

Just my $.02

Cheers

Alistair 

-Endre
>  
>
>>
>> Cheers
>>
>>  Alistair
>>
>>
>> On Monday, January 20, 2014 12:41:25 PM UTC, Akka Team wrote:
>>
>>> Hi Alistair,
>>>
>>>
>>> On Thu, Jan 16, 2014 at 9:30 AM, Alistair George 
>>> <[email protected]>wrote:
>>>
>>>> If I set up a watch on a remote actor (one on a remote actor system) 
>>>> and the network between me and the remote system fails, I get a Terminated 
>>>> message almost immediately. In fact, the remote actor hasn't terminated, 
>>>>
>>>
>>> That does not matter. If you use remote DeathWatch, and one of the 
>>> systems gets unreachable for enough time it will eventually fire Terminated 
>>> for all the watched actors on the remote system and then quarantines that 
>>> system so it never comes back again. The deathwatch failure detector 
>>> (akka.remote.watch-failure-detector) settings controls how sensitive is 
>>> this decision. If you think that a 1 hour unreachability should be not 
>>> considered terminal, then you should configure those settings 
>>> correspondingly.
>>>  
>>>
>>>> and I can still use the ActorRef to send messages to it once comms are 
>>>> restored. (However, if comms fail a second time I don't get a second 
>>>> Terminated message.)
>>>>
>>>
>>> This is because we made the mistakes in 2.2.x:
>>>  - we made quarantine times configurable
>>>  - we set it to a low value, 60 seconds
>>>
>>> After the quarantine elapses the systems can communicate again, 
>>> regardless of the Terminated message, probably this is what you observed -- 
>>> and this is exactly why quarantine in 2.3 is permanent.
>>>  
>>>
>>>>
>>>> "Terminated" and "lost contact" are rather different states, and may 
>>>> need different handling. Does anyone know of a reliable way I can 
>>>> distinguish these? 
>>>>
>>>
>>> DeathWatch sends Terminated in the case the remote system is in "lost 
>>> contact" state for a long time. How long is that time is configurable by 
>>> the DeathWatch failure detector. "lost contact" events are generated as 
>>> remote lifecycle events, but I don't recommend using those directly. 
>>> Message send supposed to be lossy, you can track reachability in your user 
>>> layer by some heartbeating mechanism if you want it.
>>>
>>> Btw, there is another failure detector 
>>> (akka.remote.transport-failure-detector) 
>>> that monitors the health of network connections, but it does not generate 
>>> Terminated events, only reconnect attempts.
>>>
>>>  In 2.3 clustering will differentiate between UNREACHABLE events (which 
>>> can heal) from removals. You probably want to use those features instead of 
>>> plain remoting.
>>>  
>>>
>>>>
>>>> Thanks
>>>>
>>>> Alistair
>>>>
>>>> -- 
>>>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>>>> >>>>>>>>>> Check the FAQ: http://akka.io/faq/
>>>> >>>>>>>>>> Search the archives: https://groups.google.com/
>>>> group/akka-user
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Akka User List" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>>
>>>> Visit this group at http://groups.google.com/group/akka-user.
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>
>>>
>>>
>>> -- 
>>> Akka Team
>>> Typesafe - The software stack for applications that scale
>>> Blog: letitcrash.com
>>> Twitter: @akkateam
>>>  
>>  -- 
>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>> >>>>>>>>>> Check the FAQ: http://akka.io/faq/
>> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "Akka User List" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected]<javascript:>
>> .
>> Visit this group at http://groups.google.com/group/akka-user.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>
>
> -- 
> Akka Team
> Typesafe - The software stack for applications that scale
> Blog: letitcrash.com
> Twitter: @akkateam
>  

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: http://akka.io/faq/
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to