Bill Burcham created GEODE-9822:
-----------------------------------
Summary: Split-brain Possible During Network Partition in
Two-Locator Cluster
Key: GEODE-9822
URL: https://issues.apache.org/jira/browse/GEODE-9822
Project: Geode
Issue Type: Bug
Components: membership
Reporter: Bill Burcham
In a two-locator cluster with default member weights and default setting (true)
of enable-network-partition-detection, if a long-lived network partition
separates the two members, a split-brain will arise: there will be two
coordinators at the same time.
The reason for this can be found in the GMSJoinLeave.isNetworkPartition()
method. That method's name is misleading. A name like majorityLost() would
probably be more apt. It needs to return true iff the weight of "crashed"
members (in the prospective view) is greater-than-or-equal-to 50% of the total
weight (of all members in the current view).
What the method actually does is return true iff the weight of "crashed"
members is greater-than 51% of the total weight. As a result, if we have two
members of equal weight, and the coordinator sees that the non-coordinator is
"crashed", the coordinator will keep running. If a network partition is
happening, and the non-coordinator is still running, then it will become a
coordinator and start producing views. Now we'll have two coordinators
producing views concurrently.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)