[ 
https://issues.apache.org/jira/browse/GEODE-9680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Smith updated GEODE-9680:
-----------------------------
    Labels:   (was: needsTriage)

> Newly Started/Restarted Locators are Susceptible to Split-Brains
> ----------------------------------------------------------------
>
>                 Key: GEODE-9680
>                 URL: https://issues.apache.org/jira/browse/GEODE-9680
>             Project: Geode
>          Issue Type: Bug
>          Components: membership
>    Affects Versions: 1.15.0
>            Reporter: Bill Burcham
>            Priority: Major
>
> The issues described here are present in all versions of Geode (this is not 
> new to 1.15.0)…
> Geode is built on the assumption that views progress linearly in a sequence. 
> If that sequence ever forks into two or more parallel lines then we have a 
> "split brain".
> In a split brain condition, each of the parallel views are independent. It's 
> as if you have more than one system running concurrently. It's possible e.g. 
> for some clients to connect to members of one view and other clients to 
> connect to members of another view. Updates to members in one view are not 
> seen by members of a parallel view.
> Geode views are produced by a coordinator. As long as only a single 
> coordinator is running, there is no possibility of a split brain. Split brain 
> arises when more than one coordinator is producing views at the same time.
> Each Geode member (peer) is started with the {{locators}} configuration 
> parameter. That parameter specifies locator(s) to use to find the (already 
> running!) coordinator (member) to join with.
> When a locator (member) starts, it goes through this sequence to find the 
> coordinator:
>  # it first tries to find the coordinator through one of the (other) 
> configured locators
>  # if it can't contact any of those, it tries contacting non-locator (cache 
> server) members it has retrieved from the "view presistence" ({{.dat}}) file
> If it hasn't found a coordinator to join with, then the locator may _become_ 
> a coordinator.
> Sometimes this is ok. If no other coordinator is currently running then this 
> behavior is fine. An example is when an [administrator is starting up a brand 
> new 
> cluster|https://geode.apache.org/docs/guide/114/configuring/running/running_the_locator.html].
>  In that case we want the very first locator we start to become the 
> coordinator.
> But there are a number of situations where there may already be another 
> coordinator running but it cannot be reached:
>  * if the administrator/operator is starting up a brand new cluster with 
> multiple locators and…
>  ** maybe Geode is running in a managed environment like Kubernetes and the 
> locators hostnames are not (yet) resolvable in DNS
>  ** maybe there is a network partition between the starting locators so they 
> can't communicate
>  ** maybe the existing locators or coordinator are running very slowly or the 
> network is degraded. This is effectively the same as the network partition 
> just mentioned
>  * if a cluster is already running and the administrator/operator wants to 
> scale it up by starting/adding a new locator Geode is susceptible to that 
> same network partition issue
>  * if a cluster is already running and the administrator/operator needs to 
> restart a locator, e.g. for a rolling upgrade, if none of the locators in the 
> {{locators}} configuration parameter are reachable (maybe they are not 
> running, or maybe there is a network partition) and…
>  ** if the "view persistence" {{.dat}} file is missing or deleted
>  ** or if the current set of running Geode members has evolved so far that 
> the coordinates (host+port) in the {{.dat}} file are completely out of date
> In each of those cases, the newly starting locator will become a coordinator 
> and will start producing views. Now we'll have the old coordinator producing 
> views at the same time as the new one.
> *When this ticket is complete*, Geode will offer a locator startup mode (via 
> TBD {{LocatorLauncher}} startup parameter) that prevents that locator from 
> becoming a coordinator. With that mode, it will be possible for an 
> administrator/operator to avoid many of the problematic scenarios mentioned 
> above, while retaining the ability to start a first locator which is allowed 
> to become a coordinator.
> For purposes of discussion we'll call the startup mode that allows the 
> locator to become a coordinator "seed" mode, and we'll call the new startup 
> mode that prevents the locator from becoming a coordinator before first 
> joining, "join-only" mode.
> To start a brand new cluster, an administrator/operator starts the first 
> locator in "seed" mode. After that the operator starts all subsequent 
> locators in "join only" mode. If network partitions occur during startup, 
> those newly started nodes will exit with a failure status, but will not 
> become coordinators.
> To add a locator to a running cluster, an operator starts it in "join only" 
> mode. The new member will similarly either join with an existing coordinator 
> or exit with a failure status, thereby avoiding split brains.
> When an operator restarts a locator, e.g. during a rolling upgrade, they will 
> restarted in "join only" mode. If a network partition is encountered, or the 
> {{.dat}} file is missing or stale, the new locator will exit with a failure 
> status and split brain will be avoided.
> h2. 
> FAQ
> Q: What should happen if a locator is started in seed mode, but it can see 
> another view member is already acting as coordinator?
> A: TBD
>  
> Q: How long will join only wait before giving up and exiting? 
> A: TBD



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to