On Thursday, June 20, 2013, Bo wrote:
>
> Howdy!
>
> Loving working with ceph; learning a lot. :)
>
> I am curious about the quorum process because I seem to get conflicting 
> information from "experts". Those that I report to need a clear answer from 
> me which I am currently unable to give.
>
> Ceph needs an odd number of monitors in any given cluster (3, 5, 7) to avoid 
> split-brain syndrome. So what happens whenever I have 3 monitors, 1 dies, and 
> I have 2 left?
>
> The information regarding this situation that I have gathered over the past 
> few months all falls within these three categories:
> A) commonly "stated"--nothing is said. period.
> B) rarely stated--this is a bad situation (possibly split-brain).
> C) rarely stated--each monitor has a "rank", so the highest ranking monitor 
> is the boss, thus quorum.
>
> Does anyone know with absolute certainty what ceph's quorum logic will do 
> with an even number of (specifically 2) monitors left?
>
> You may say, "well, take down one of your monitors", to which I respectfully 
> state that my testing is not an authoritative answer on what ceph is designed 
> to do and what it does in production. My testing cannot cover the vast 
> majority of cases covered by the hundreds/thousands who have had a monitor 
> die.
>
> Thank you for your time and brain juice,
> -bo


This is often misunderstood, but the answers to your questions are
pretty simple. :)

There is no risk of split brain in Ceph (so, not in the monitor either).
The mantra to use an odd number of monitors is *not* a system
requirement; it is a deployment recommendation. This is due to how the
cluster avoids split brain — using a Paxos variant in which a strict
majority of the monitors need to agree on everything. Using one
monitor, you can make forward progress if it's running; using two
monitors, you can afford for neither of them to die (because then you
only have 50% up); using three monitors you can lose one; using four
you can lose one; using five you can lose two; etc. So using an even
number of monitors increases your odds of failure without increasing
your survivability (in availability terms) of failure over the
previous odd number.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to