I have been hacking on a discovery plugin for elasticsearch<https://github.com/shikhar/eskka> using akka cluster and I wanted to add some automated downing, and the auto-down-unreachable-after is not really an option since it can lead to split brain.
So I went with the approach of using a quorum of members to determine whether the unreachable node should be downed. I'm curious to hear what you think of this. see https://github.com/shikhar/eskka/blob/master/src/main/scala/eskka/QuorumBasedPartitionMonitor.scala 1. The VotingMembers<https://github.com/shikhar/eskka/blob/release-0.1/src/main/scala/eskka/VotingMembers.scala>passed in the constructor are the seed nodes. Using seed nodes was just an easy choice since they are specified before-hand. So ideally there should be 3 or more seed nodes. 2. I am using an app-level ping layer<https://github.com/shikhar/eskka/blob/master/src/main/scala/eskka/Pinger.scala>on top of the UNREACHABLE events. When a ping request to an unreachable node, made via the seed nodes "affirmatively times-out" (i.e. they must explicitly return a timeout response rather than the ping request timing out, so that we don't consider an unreachable seed-node as a voter!), then we DOWN that unreachable node. Instead of these app-level pings maybe it makes sense to utilize the Akka private[cluster] metadata like Reachability.isReachable(observer, node) but I'm not entirely sure of the semantics. 3. Currently this QuorumBasedPartitionMonitor actor gets started on every seed node. So in case a member becomes unreachable, they'd all end up trying to arrange for a distributed ping to the unreachable node via one another, and possibly downing it. This seems a bit like a thundering herd so not ideal. But on the other hand I don't want to use a cluster-singleton because this partition resolver is trying to be the layer that allows for singleton failover to happen smoothly. I'd love to hear ideas on how to handle this better. 4. Maybe a generic solution for quorum-based partition resolution should be a part of Akka proper/contrib? It seems AutoDown is rarely a good answer. -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.