I agree with David.

With using the quorum of (n/2)+1 you are safe the cluster can always elect
a single leader.

For multiple DC, running a single cluster without such quorum is high risk
unless you have a reliable network. Why not running several clusters, one
per DC, and syncing them over a slow connection, depending on the
availability of the DCs?

ES is already running a regular job for (re-)discovering nodes.

If a split brain happened it is too late to resolve, without weird effects
after a rejoin. ES does not mark data operations with a distributed
timestamp protocol so conflict resolution must depend on voting. Such a
voting is not stable. With two halves of a cluster, you may have never a
winner, and data operations could be applied in wrong order.

Jörg



On Tue, Feb 18, 2014 at 10:59 AM, Robert Stupp
<[email protected]>wrote:

>  Hi,
>
> Recently we discovered that Elasticsearch is not able to solve a previous
> split brain situation of an existing cluster. The problem (split brain and
> further resolution) can be splitted into two main parts:
>
>    1. Reorganization of the whole cluster and logging
>    2. Resolution of data conflicts
>
> The first thing should be fairly "easy" to solve. Discovery should take
> place regularly and update the cluster organization if necessary.
>
> The second thing would be more complex and dependent of what users are
> doing. In our application it is not that important that conflicts caused by
> split brain is solved by Elasticsearch - we can easily handle this
> (re-import the data modified while the split brain situation).
>
> IMHO it is much better to let ES solve the split brain than to let it run
> "forever" in the split brain situation.
>
>
>
> From the original issue
> https://github.com/elasticsearch/elasticsearch/issues/5144 :
>
> -------------------------
>
> we have a 4 node ES cluster running ("plain" Zen discovery - no cloud
> stuff). Two nodes are in one DC - two nodes in another DC.
>
> When the network connection between both DCs fails, ES forms two two-node
> ES clusters - a split brain. When the network is operative again, the split
> brain situation is remains persistent.
>
> I've setup a small local test with a 4 node ES cluster:
>
> +--------+                         +--------+
> | Node A | ----\             /---- | Node C |
> +--------+      \.........../      +--------+
> +--------+      /           \      +--------+
> | Node B | ----/             \---- | Node D |
> +--------+                         +--------+
>                Single ES cluster
>
> When the network connection fails, two two node clusters exists (split
> brain). I've simulated that with "iptables -A INPUT/OUTPUT -s/d -j DROP"
> statements.
>
> +--------+                         +--------+
> | Node A | ----\             /---- | Node C |
> +--------+      \           /      +--------+
> +--------+      /           \      +--------+
> | Node B | ----/             \---- | Node D |
> +--------+                         +--------+
>   ES cluster                      ES cluster
>
> When the network between nodes AB and CD is operative again, the single
> cluster status is not restored (split brain is persistent).
>
> It did not make a difference, whether unicast or multicast ZEN discovery
> is used.
>
> Another issue is that operating system keepalive settings affects the time
> after which ES detects a node failure. Keepalive timeout settings (e.g.
> net.ipv4.tcp_keepalive_time/probes/intvl) directly influence the node
> failure detection.
>
> There should be some task, that regularly polls the "alive" status of all
> known other nodes.
>
> Tested with ES 1.0.0 (and an older 0.90.3).
>
> -----------------------
>
> David Pilato: "Did you try to set minimum_master_node to 3? See
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html#master-election
> "
>
> -----------------------
>
> Me: "Setting minimum_master_nodes to 3 is not an option. If I understand
> correctly, it would force all 4 nodes to stop working at all - means: no
> service at all. This wouldn't cover the case, that two nodes are taken down
> for maintenance work. And what if there a three DCs (each with 2 nodes) - a
> setting of minimum_master_nodes=5 would only allow one node to fail before
> ES stops working. IMHO there should be a regular job inside ES, that checks
> the existence of other nodes (either via unicast or via multicast) and
> triggers (re-)discovery if necessary - the split brain situation must be
> resolved."
>
> -----------------------
>
> David Pilato: "Exactly. Cluster will stop working until network connection
> is up again.
> What do you expect? Which part of the cluster should hold the master in
> case of network outage?
>
> Cross Data center replication is not supported yet and you should consider:
>
>    - use the great snapshot and restore feature to snapshot from a DC and
>    restore in the other one
>    - index in both DC (so two distinct clusters) from a client level
>    - use Tribe node feature to search or index on multiple clusters
>
> I think we should move this conversation to the mailing list."
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1cc24862-5a95-4e2e-9dc4-6d8d5445b016%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFt7iJ7NW6oLz7VGS2CSR%2BEuOt%2BpOM1Dz%3DK8CQnWf%3D-Kw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to