On Thursday, June 20, 2013, Bo wrote: > > Howdy! > > Loving working with ceph; learning a lot. :) > > I am curious about the quorum process because I seem to get conflicting > information from "experts". Those that I report to need a clear answer from > me which I am currently unable to give. > > Ceph needs an odd number of monitors in any given cluster (3, 5, 7) to avoid > split-brain syndrome. So what happens whenever I have 3 monitors, 1 dies, and > I have 2 left? > > The information regarding this situation that I have gathered over the past > few months all falls within these three categories: > A) commonly "stated"--nothing is said. period. > B) rarely stated--this is a bad situation (possibly split-brain). > C) rarely stated--each monitor has a "rank", so the highest ranking monitor > is the boss, thus quorum. > > Does anyone know with absolute certainty what ceph's quorum logic will do > with an even number of (specifically 2) monitors left? > > You may say, "well, take down one of your monitors", to which I respectfully > state that my testing is not an authoritative answer on what ceph is designed > to do and what it does in production. My testing cannot cover the vast > majority of cases covered by the hundreds/thousands who have had a monitor > die. > > Thank you for your time and brain juice, > -bo
This is often misunderstood, but the answers to your questions are pretty simple. :) There is no risk of split brain in Ceph (so, not in the monitor either). The mantra to use an odd number of monitors is *not* a system requirement; it is a deployment recommendation. This is due to how the cluster avoids split brain — using a Paxos variant in which a strict majority of the monitors need to agree on everything. Using one monitor, you can make forward progress if it's running; using two monitors, you can afford for neither of them to die (because then you only have 50% up); using three monitors you can lose one; using four you can lose one; using five you can lose two; etc. So using an even number of monitors increases your odds of failure without increasing your survivability (in availability terms) of failure over the previous odd number. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
