On 05/11/2012 11:39 AM, Geoff Galitz wrote:
>> On 05/11/2012 01:58 AM, Geoff Galitz wrote:
>>> Are we more likely to avoid a split brain scenario if we have more than  
>>> 2
>>> bricks total?  IOW, one brick on at least three servers or more?
>>
>> Yes.
> 
> One of my guys asked if using the quorum feature in 3.3 makes any  
> difference in this scenario compared to 3.2.x.

Heh.  I knew there'd be a follow-up.  I seriously couldn't think of anything
more useful to say before, and couldn't resist the opportunity to balance my
usual verbosity with a dose of brevity, but I'll be glad to address further
questions as best I can.

Having more than three bricks will enhance your ability to repair a split brain
after it has happened.  Using the quorum-enforcement feature will make it less
likely that you'll get split-brain in the first place.  Having both is ideal,
because quorum enforcement with R=2 is really a bit of a hack.  I'm the guy who
implemented it, so I can say that.  ;)  The problem is that with R=2 a single
failure denies true quorum to either side.  Because it's a common configuration
we handle it anyway, by deciding ties in favor of the first brick in the
configured list, but personally I'd rather not see that become common.  In
fact, some of us are having a very detailed and somewhat heated discussion
about ways to avoid using that tie-breaker at least in the case where the total
number of servers is greater than two.

Getting back to the original question, combining R>2 with quorum enforcement is
quite ideal, because in that case a single failure still leaves one side with a
true quorum.  The downside is that writes (or other modifying operations) on a
client that can't see a quorum of the bricks will get EROFS errors.  If your
application isn't prepared to handle that, it might not help much, but in most
situations it's better than split brain.  Just keep telling yourself that every
EROFS you see is a potential split-brain disaster that you've avoided.  The
insidious thing about split brain is that it can cause errors to remain latent
in your system for a long time - and they usually seem to manifest at the most
inconvenient times.  Having to deal with EROFS instead is far preferable, and I
have the scars to prove it.
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to