just when you thought that this thread would die :-)

So, Guglielmo,

in an earlier posting on this thread you said "BTW, how does AC defend against the problem of a split-brain cluster?
Shared scsi disk? Majority voting? Curious."

So, I am wondering how might I use e.g. a shared disc or majority voting in this situation ? In order to decide which fragment was the original cluster and which was the piece that had broken off ? but then what would the piece that had broken off do ? shutdown ?

Do you think that we need to worry about situations where a piece of state has more than one client, so a network partition may result in two copies diverging in different and incompatible directions, rather than only one diverging. I can imagine this happening in an Entity Bean (but we should be able to use the DB to resolve this) or an application POJO. I haven't considered the latter case and it looks pretty hopeless to me, unless you have some alternative route over which the two fragments can communicate... but then, if you did, would you not pair it with your original network, so that the one failed over to the other or replicated its activity, so that you never perceived a split in the first place ? Is this a common solution, or do people use other mechanisms here ?

thanks again for your time,


Jules


lichtner wrote:

On Tue, 17 Jan 2006, Jules Gosnell wrote:

I believe that if you put some spare capacity in your cluster you will get
good availability. For example, if your minimum R is 2 and the normal
operating value is 4, when a node fails you will not be frantically doing
state transfer.


OK - so your system is a little more relaxed about the exact number of
replicants. You specify upper and lower bounds rather  than an absolute
number, then you move towards the upper bound when you have the capacity ?

That's the idea. It's a bit like having hot spares, but all nodes are
treated on the same footing.

I would also just send a redirect. I don't think it's worth relocating a
session.

If you can communicate the session's location to the load-balancer, then
I agree, but some load-balancers are pretty dumb :-)

I see .. I was hoping somebody was not going to say that. Even so, it
depends on the latency of the request when it actually request. After all,
this only happens after a failure. But no matter, you can also move the
session over.

Guglielmo


--
"Open Source is a self-assembling organism. You dangle a piece of
string into a super-saturated solution and a whole operating-system
crystallises out around it."

/**********************************
* Jules Gosnell
* Partner
* Core Developers Network (Europe)
*
*    www.coredevelopers.net
*
* Open Source Training & Support.
**********************************/

Reply via email to