Re: Replication using totem protocol

Jules Gosnell Wed, 18 Jan 2006 01:27:05 -0800

lichtner wrote:

On Tue, 17 Jan 2006, Jules Gosnell wrote:

just when you thought that this thread would die :-)


I think Jeff Genender wanted a discussion to be sparked, and it worked.

So, I am wondering how might I use e.g. a shared disc or majority voting
in this situation ? In order to decide which fragment was the original
cluster and which was the piece that had broken off ? but then what
would the piece that had broken off do ? shutdown ?


Wait to rejoin the cluster. Since it is not "the" cluster, it waits. It is
not safe to make any updates.

_How_ a groups decides it is "the" cluster can be done in several ways.
Shared-disk cluster can do by a locking operation on a disk (I would have
to research the details on this), a cluster with a database can get a lock
from the database (and keep the connection open). And one way to do this
in a shared-nothing cluster is to use a quorum of N/2 + 1, where is the
maximum number of nodes. Clearly it has to be the majority or else you can
have a split-brain cluster.

I haven't been able to convince myself to take the quorum approachbecause...


shared-something approach:

- the shared something is a Single Point of Failure (SPoF) - althoughyou could use an HA something.- If the node holding the lock 'goes crazy', but does not die, the restof the cluster becomes a fragment - so it becomes an SPoF as well.- used in isolation, it does not take into account that the lock may beheld by the smallest cluster fragment


shared-nothing approach:

- I prefer this approach, but, as you have stated, if the two halves areequally sized...

- What if there are two concurrent fractures (does this happen?)

- ActiveCluster notifies you of one membership change at a time - so youwould have to decide on an algorithm for 'chunking' node loss, so thatyou could decide when a fragmentation had occurred...

perhaps a hybrid of the two would be able to cover more bases... -shared-nothing falling back to shared-something if your fragment issized N/2.

As far as my plans for WADI, I think I am happy to stick with the, 'relyon affinity and keep going' approach.

As far as situations where a distributed object may have more than oneclient, I can see that quorum offers the hope of a solution, but,without some very careful thought, I would still be hesitant to stake myshirt on it :-) for the reasons given above...

I hadn't really considered 'pausing' a cluster fragment, so this is auseful idea. I guess that I have been thinking more in terms oflong-lived fractures, rather than short-lived ones. If the latter arethat much more common, then this is great input and I need to take itinto account.

The issue about 'chunking' node loss interests me... I see that theEVS4J Listener returns a set of members, so it is possible to expressthe loss of more than one node. How is membership decided and node lossaggregated ?


Thanks again for your time,


Jules

Do you think that we need to worry about situations where a piece of
state has more than one client, so a network partition may result in two
copies diverging in different and incompatible directions, rather than
only one diverging.


If you use a quorum or quorum-resource as above you do not have this
problem. You can turn down the requests or let them block until the
cluster re-discovers the 'failed' nodes.

I can imagine this happening in an Entity Bean (but
we should be able to use the DB to resolve this) or an application POJO.
I haven't considered the latter case and it looks pretty hopeless to me,
unless you have some alternative route over which the two fragments can
communicate... but then, if you did, would you not pair it with your
original network, so that the one failed over to the other or replicated
its activity, so that you never perceived a split in the first place ?
Is this a common solution, or do people use other mechanisms here ?


I do believe that membership and quorum is all you need.

Guglielmo



--
"Open Source is a self-assembling organism. You dangle a piece of
string into a super-saturated solution and a whole operating-system
crystallises out around it."

/**********************************
* Jules Gosnell
* Partner
* Core Developers Network (Europe)
*
*    www.coredevelopers.net
*
* Open Source Training & Support.
**********************************/

Re: Replication using totem protocol

Reply via email to