Re: [akka-user] How to avoid the separating of cluster when network is down.

Akka Team Fri, 16 Jan 2015 04:36:37 -0800

Hi there Bourne,


> At the moment , i have an app written in java, that consists of many
> actors of the same role, using Cluster Singleton.
> i use the automatic downing to handle the high availability aspect of the
> application [...]
>



> But the problem is, i cant differentiate between the failing of a node (
> jmv containing the actor crashs and died for some reason...) and the
> failing of the network.
>
One might say: "Welcome to distributed computing". It is not really
possible to know if a "node is not replying" or "the node is down" or "the
node is replying, but all messages from it get lost because of a failing
router somewhere in kansas".
You might enjoy these talks about "" or "distributed consensus a.k.a. what
do we eat for lunch?" which touch on the inherent problems of network
partitions and such.

So if the network is somehow down for a long period of time, when it comes
> up, there will be separated clusters.
> And i cant use manually downing because i dont know when to assume that it
> is down not because of a network failure.
>
I wouldn't say you "can't" - it's just very impractical and slow - you'd
have to investigate which nodes can't communicate and why etc. It's often
way too slow to involve a human being in decision making about cluster
states.
"Manual downing" can also have another meaning - it could use some signals
from external monitoring services (zabbix, nagios) to trigger the downing
using the programatic API provided by Akka.
>From Akka's viewpoinrt this is "manual downing" (well, "programatic
downing"), but from your perspective it would be automatic.



> What would you suggest me do in this situation.
>
a) do not use auto-downing with cluster singletons, it's really asking for
trouble - as it is a known and inherent to any timeout-downing problem that
you may split up the cluster.
b) implement a downing strategy based on a consensus protocol. Simple quorum
voting <http://en.wikipedia.org/wiki/Quorum_%28distributed_computing%29>
would be a good start here, though getting your hands into Paxos
<http://en.wikipedia.org/wiki/Paxos_(computer_science)> or Raft
<http://en.wikipedia.org/wiki/Raft_(computer_science)> based algorithms
would be the safest to choose.
I have a draft <https://github.com/ktoso/akka-raft> of an raft
implementation using Akka, though it is not yet battle-proven so I can't
recommend it for production use *yet*.

Implementing a downing strategy is actually rather simple though it may not
sound so at first - you simply need to coordinate these messages and then
if you're "sure" call `cluster.down(node)`.
You can refer to the timeout based AutoDown implementation in Akka, here:
https://github.com/akka/akka/blob/master/akka-cluster/src/main/scala/akka/cluster/AutoDown.scala

As you'll notice it's really very simple and actually "just a normal actor"
which watches the cluster state changes - you can implement your own
downing strategies using a similar aproach :-)



> Sorry for my bad english and thank you all for the help.
>
No need to appologise! Most of us are not native english speakers here as
well :-)


If you'd get into implementing new downing strategies it would be awesome
to keep in touch and perhaps pull requesting it back to Akka - we currently
do not have the capacity to work on this feature in the near term..
Hope this helps :-)

-- 
Konrad

Akka Team
Typesafe - The software stack for applications that scale
Blog: letitcrash.com
Twitter: @akkateam

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Re: [akka-user] How to avoid the separating of cluster when network is down.

Reply via email to