Re: Replication using totem protocol

Jules Gosnell Tue, 17 Jan 2006 00:42:18 -0800


just when you thought that this thread would die :-)


So, Guglielmo,

in an earlier posting on this thread you said "BTW, how does AC defendagainst the problem of a split-brain cluster?

Shared scsi disk? Majority voting? Curious."

So, I am wondering how might I use e.g. a shared disc or majority votingin this situation ? In order to decide which fragment was the originalcluster and which was the piece that had broken off ? but then whatwould the piece that had broken off do ? shutdown ?

Do you think that we need to worry about situations where a piece ofstate has more than one client, so a network partition may result in twocopies diverging in different and incompatible directions, rather thanonly one diverging. I can imagine this happening in an Entity Bean (butwe should be able to use the DB to resolve this) or an application POJO.I haven't considered the latter case and it looks pretty hopeless to me,unless you have some alternative route over which the two fragments cancommunicate... but then, if you did, would you not pair it with youroriginal network, so that the one failed over to the other or replicatedits activity, so that you never perceived a split in the first place ?Is this a common solution, or do people use other mechanisms here ?


thanks again for your time,


Jules


lichtner wrote:

On Tue, 17 Jan 2006, Jules Gosnell wrote:

I believe that if you put some spare capacity in your cluster you will get
good availability. For example, if your minimum R is 2 and the normal
operating value is 4, when a node fails you will not be frantically doing
state transfer.

OK - so your system is a little more relaxed about the exact number of
replicants. You specify upper and lower bounds rather  than an absolute
number, then you move towards the upper bound when you have the capacity ?


That's the idea. It's a bit like having hot spares, but all nodes are
treated on the same footing.

I would also just send a redirect. I don't think it's worth relocating a
session.

If you can communicate the session's location to the load-balancer, then
I agree, but some load-balancers are pretty dumb :-)


I see .. I was hoping somebody was not going to say that. Even so, it
depends on the latency of the request when it actually request. After all,
this only happens after a failure. But no matter, you can also move the
session over.

Guglielmo



--
"Open Source is a self-assembling organism. You dangle a piece of
string into a super-saturated solution and a whole operating-system
crystallises out around it."

/**********************************
* Jules Gosnell
* Partner
* Core Developers Network (Europe)
*
*    www.coredevelopers.net
*
* Open Source Training & Support.
**********************************/

Re: Replication using totem protocol

Reply via email to