lichtner wrote:

On Tue, 17 Jan 2006, Jules Gosnell wrote:

just when you thought that this thread would die :-)

I think Jeff Genender wanted a discussion to be sparked, and it worked.

So, I am wondering how might I use e.g. a shared disc or majority voting
in this situation ? In order to decide which fragment was the original
cluster and which was the piece that had broken off ? but then what
would the piece that had broken off do ? shutdown ?

Wait to rejoin the cluster. Since it is not "the" cluster, it waits. It is
not safe to make any updates.

_How_ a groups decides it is "the" cluster can be done in several ways.
Shared-disk cluster can do by a locking operation on a disk (I would have
to research the details on this), a cluster with a database can get a lock
from the database (and keep the connection open). And one way to do this
in a shared-nothing cluster is to use a quorum of N/2 + 1, where is the
maximum number of nodes. Clearly it has to be the majority or else you can
have a split-brain cluster.
I haven't been able to convince myself to take the quorum approach because...

shared-something approach:
- the shared something is a Single Point of Failure (SPoF) - although you could use an HA something. - If the node holding the lock 'goes crazy', but does not die, the rest of the cluster becomes a fragment - so it becomes an SPoF as well. - used in isolation, it does not take into account that the lock may be held by the smallest cluster fragment

shared-nothing approach:
- I prefer this approach, but, as you have stated, if the two halves are equally sized...
- What if there are two concurrent fractures (does this happen?)
- ActiveCluster notifies you of one membership change at a time - so you would have to decide on an algorithm for 'chunking' node loss, so that you could decide when a fragmentation had occurred...

perhaps a hybrid of the two would be able to cover more bases... - shared-nothing falling back to shared-something if your fragment is sized N/2.

As far as my plans for WADI, I think I am happy to stick with the, 'rely on affinity and keep going' approach.

As far as situations where a distributed object may have more than one client, I can see that quorum offers the hope of a solution, but, without some very careful thought, I would still be hesitant to stake my shirt on it :-) for the reasons given above...

I hadn't really considered 'pausing' a cluster fragment, so this is a useful idea. I guess that I have been thinking more in terms of long-lived fractures, rather than short-lived ones. If the latter are that much more common, then this is great input and I need to take it into account.

The issue about 'chunking' node loss interests me... I see that the EVS4J Listener returns a set of members, so it is possible to express the loss of more than one node. How is membership decided and node loss aggregated ?

Thanks again for your time,


Jules

Do you think that we need to worry about situations where a piece of
state has more than one client, so a network partition may result in two
copies diverging in different and incompatible directions, rather than
only one diverging.

If you use a quorum or quorum-resource as above you do not have this
problem. You can turn down the requests or let them block until the
cluster re-discovers the 'failed' nodes.

I can imagine this happening in an Entity Bean (but
we should be able to use the DB to resolve this) or an application POJO.
I haven't considered the latter case and it looks pretty hopeless to me,
unless you have some alternative route over which the two fragments can
communicate... but then, if you did, would you not pair it with your
original network, so that the one failed over to the other or replicated
its activity, so that you never perceived a split in the first place ?
Is this a common solution, or do people use other mechanisms here ?

I do believe that membership and quorum is all you need.

Guglielmo


--
"Open Source is a self-assembling organism. You dangle a piece of
string into a super-saturated solution and a whole operating-system
crystallises out around it."

/**********************************
* Jules Gosnell
* Partner
* Core Developers Network (Europe)
*
*    www.coredevelopers.net
*
* Open Source Training & Support.
**********************************/

Reply via email to