Hi guys, The tools linked by Morten seem interesting, I'll give them a read later :)
I have party solved the issue by removing auto-down from the configuration Good, it's not a good idea to rely on auto-downing (as it's pretty naive and 1-by-1 timer based) in production. We recommend using explicit Cluster.down() commands fed by external monitoring solutions, or ops which have an overview on the cluster "from the outside" and can make the right decision to down and kill specific nodes. In general deciding this automatically is always risky in some form (due to the nature of any distributed application – you never know if a node is "slow" or "really down"). and only allowing the cluster singleton to down members and members that notice quarantine when they reconnect will restart their actorsystems. However this causes a problem when the cluster singleton or acting master is the one who goes down or is separated from the cluster, now noone can down this node and no new singleton will start so the whole cluster is put in stasis. Correct, however at least it is then consistent – no split-brain can happen in an Akka cluster without automatic downing. You could call Cluster.down(someNodesAddress) to mark nodes down, and cause the singletons to kick in migration manually. Anyone got any clever solution to this problem? We do actually - the Split Brain Resolver. It's part of the Reactive Platform and implements a number of strategies on how downing can be performed more safely than just timeouts (auto-downing). The strategies are for example "static quorum" or "keep majority" etc. Each of them has specific trade-offs, i.e. scenarios where they work well, and failure scenarios where the strategy would make a decision consistent with how it's working, but maybe not what you need. The docs are available here: http://doc.akka.io/docs/akka/rp-15v09p01/scala/split-brain-resolver.html and go pretty in-depth about how it all works. In order to use this in production you'll need to obtain a Reactive Platform subscription, more details here: http://www.typesafe.com/products/typesafe-reactive-platform (it also explains on the bottom how you can try it out). I also did a webinar 2 weeks ago about new features in Akka 2.4 and Reactive Platform where I also covered the Split Brain Resolver a bit: https://youtu.be/D3mPl8OUrjs?t=9m11s The entire webinar should be pretty interesting I hope, though I've marked the 9 minute mark where it's mostly about the Resolver. You can contact us here to get specific details on the subscription: https://www.typesafe.com/company/contact Hope this helps! -- Cheers, Konrad `ktoso` Malawski Akka @ Typesafe -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
