Hi guys,

The tools linked by Morten seem interesting, I'll give them a read later :)

I have party solved the issue by removing auto-down from the configuration 
Good, it's not a good idea to rely on auto-downing (as it's pretty naive and 
1-by-1 timer based) in production.

We recommend using explicit Cluster.down() commands fed by external monitoring 
solutions, or ops which have an overview on the cluster "from the outside" and 
can make the right decision to down and kill specific nodes. In general 
deciding this automatically is always risky in some form (due to the nature of 
any distributed application – you never know if a node is "slow" or "really 
down").

and only allowing the cluster singleton to down members and members that notice 
quarantine when they reconnect will restart their actorsystems. However this 
causes a problem when the cluster singleton or acting master is the one who 
goes down or is separated from the cluster, now noone can down this node and no 
new singleton will start so the whole cluster is put in stasis. 
Correct, however at least it is then consistent – no split-brain can happen in 
an Akka cluster without automatic downing.

You could call Cluster.down(someNodesAddress) to mark nodes down, and cause the 
singletons to kick in migration manually.



Anyone got any clever solution to this problem?
We do actually - the Split Brain Resolver.

It's part of the Reactive Platform and implements a number of strategies on how 
downing can be performed more safely than just timeouts (auto-downing). The 
strategies are for example "static quorum" or "keep majority" etc. Each of them 
has specific trade-offs, i.e. scenarios where they work well, and failure 
scenarios where the strategy would make a decision consistent with how it's 
working, but maybe not what you need. 

The docs are available here: 
http://doc.akka.io/docs/akka/rp-15v09p01/scala/split-brain-resolver.html and go 
pretty in-depth about how it all works.

In order to use this in production you'll need to obtain a Reactive Platform 
subscription, more details here: 
http://www.typesafe.com/products/typesafe-reactive-platform (it also explains 
on the bottom how you can try it out).

I also did a webinar 2 weeks ago about new features in Akka 2.4 and Reactive 
Platform where I also covered the Split Brain Resolver a bit: 
https://youtu.be/D3mPl8OUrjs?t=9m11s The entire webinar should be pretty 
interesting I hope, though I've marked the 9 minute mark where it's mostly 
about the Resolver.



You can contact us here to get specific details on the subscription: 
https://www.typesafe.com/company/contact

Hope this helps!



-- 
Cheers,
Konrad `ktoso` Malawski
Akka @ Typesafe


-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to