Re: [akka-user] Remote DeathWatch and comms failure

Akka Team Mon, 20 Jan 2014 04:42:02 -0800

Hi Alistair,

On Thu, Jan 16, 2014 at 9:30 AM, Alistair George
<[email protected]>wrote:

> If I set up a watch on a remote actor (one on a remote actor system) and
> the network between me and the remote system fails, I get a Terminated
> message almost immediately. In fact, the remote actor hasn't terminated,
>

That does not matter. If you use remote DeathWatch, and one of the systems
gets unreachable for enough time it will eventually fire Terminated for all
the watched actors on the remote system and then quarantines that system so
it never comes back again. The deathwatch failure detector
(akka.remote.watch-failure-detector) settings controls how sensitive is
this decision. If you think that a 1 hour unreachability should be not
considered terminal, then you should configure those settings
correspondingly.

> and I can still use the ActorRef to send messages to it once comms are
> restored. (However, if comms fail a second time I don't get a second
> Terminated message.)
>

This is because we made the mistakes in 2.2.x:
 - we made quarantine times configurable
 - we set it to a low value, 60 seconds

After the quarantine elapses the systems can communicate again, regardless
of the Terminated message, probably this is what you observed -- and this
is exactly why quarantine in 2.3 is permanent.

>
> "Terminated" and "lost contact" are rather different states, and may need
> different handling. Does anyone know of a reliable way I can distinguish
> these?
>

DeathWatch sends Terminated in the case the remote system is in "lost
contact" state for a long time. How long is that time is configurable by
the DeathWatch failure detector. "lost contact" events are generated as
remote lifecycle events, but I don't recommend using those directly.
Message send supposed to be lossy, you can track reachability in your user
layer by some heartbeating mechanism if you want it.

Btw, there is another failure detector
(akka.remote.transport-failure-detector) that monitors the health of
network connections, but it does not generate Terminated events, only
reconnect attempts.

In 2.3 clustering will differentiate between UNREACHABLE events (which can
heal) from removals. You probably want to use those features instead of
plain remoting.

>
> Thanks
>
> Alistair
>
> --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ: http://akka.io/faq/
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
Akka Team
Typesafe - The software stack for applications that scale
Blog: letitcrash.com
Twitter: @akkateam

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: http://akka.io/faq/
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [akka-user] Remote DeathWatch and comms failure

Reply via email to