I agree with David. With using the quorum of (n/2)+1 you are safe the cluster can always elect a single leader.
For multiple DC, running a single cluster without such quorum is high risk unless you have a reliable network. Why not running several clusters, one per DC, and syncing them over a slow connection, depending on the availability of the DCs? ES is already running a regular job for (re-)discovering nodes. If a split brain happened it is too late to resolve, without weird effects after a rejoin. ES does not mark data operations with a distributed timestamp protocol so conflict resolution must depend on voting. Such a voting is not stable. With two halves of a cluster, you may have never a winner, and data operations could be applied in wrong order. Jörg On Tue, Feb 18, 2014 at 10:59 AM, Robert Stupp <[email protected]>wrote: > Hi, > > Recently we discovered that Elasticsearch is not able to solve a previous > split brain situation of an existing cluster. The problem (split brain and > further resolution) can be splitted into two main parts: > > 1. Reorganization of the whole cluster and logging > 2. Resolution of data conflicts > > The first thing should be fairly "easy" to solve. Discovery should take > place regularly and update the cluster organization if necessary. > > The second thing would be more complex and dependent of what users are > doing. In our application it is not that important that conflicts caused by > split brain is solved by Elasticsearch - we can easily handle this > (re-import the data modified while the split brain situation). > > IMHO it is much better to let ES solve the split brain than to let it run > "forever" in the split brain situation. > > > > From the original issue > https://github.com/elasticsearch/elasticsearch/issues/5144 : > > ------------------------- > > we have a 4 node ES cluster running ("plain" Zen discovery - no cloud > stuff). Two nodes are in one DC - two nodes in another DC. > > When the network connection between both DCs fails, ES forms two two-node > ES clusters - a split brain. When the network is operative again, the split > brain situation is remains persistent. > > I've setup a small local test with a 4 node ES cluster: > > +--------+ +--------+ > | Node A | ----\ /---- | Node C | > +--------+ \.........../ +--------+ > +--------+ / \ +--------+ > | Node B | ----/ \---- | Node D | > +--------+ +--------+ > Single ES cluster > > When the network connection fails, two two node clusters exists (split > brain). I've simulated that with "iptables -A INPUT/OUTPUT -s/d -j DROP" > statements. > > +--------+ +--------+ > | Node A | ----\ /---- | Node C | > +--------+ \ / +--------+ > +--------+ / \ +--------+ > | Node B | ----/ \---- | Node D | > +--------+ +--------+ > ES cluster ES cluster > > When the network between nodes AB and CD is operative again, the single > cluster status is not restored (split brain is persistent). > > It did not make a difference, whether unicast or multicast ZEN discovery > is used. > > Another issue is that operating system keepalive settings affects the time > after which ES detects a node failure. Keepalive timeout settings (e.g. > net.ipv4.tcp_keepalive_time/probes/intvl) directly influence the node > failure detection. > > There should be some task, that regularly polls the "alive" status of all > known other nodes. > > Tested with ES 1.0.0 (and an older 0.90.3). > > ----------------------- > > David Pilato: "Did you try to set minimum_master_node to 3? See > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html#master-election > " > > ----------------------- > > Me: "Setting minimum_master_nodes to 3 is not an option. If I understand > correctly, it would force all 4 nodes to stop working at all - means: no > service at all. This wouldn't cover the case, that two nodes are taken down > for maintenance work. And what if there a three DCs (each with 2 nodes) - a > setting of minimum_master_nodes=5 would only allow one node to fail before > ES stops working. IMHO there should be a regular job inside ES, that checks > the existence of other nodes (either via unicast or via multicast) and > triggers (re-)discovery if necessary - the split brain situation must be > resolved." > > ----------------------- > > David Pilato: "Exactly. Cluster will stop working until network connection > is up again. > What do you expect? Which part of the cluster should hold the master in > case of network outage? > > Cross Data center replication is not supported yet and you should consider: > > - use the great snapshot and restore feature to snapshot from a DC and > restore in the other one > - index in both DC (so two distinct clusters) from a client level > - use Tribe node feature to search or index on multiple clusters > > I think we should move this conversation to the mailing list." > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/1cc24862-5a95-4e2e-9dc4-6d8d5445b016%40googlegroups.com > . > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFt7iJ7NW6oLz7VGS2CSR%2BEuOt%2BpOM1Dz%3DK8CQnWf%3D-Kw%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
