On Tue, 17 Jan 2006, Jules Gosnell wrote:
> just when you thought that this thread would die :-) I think Jeff Genender wanted a discussion to be sparked, and it worked. > So, I am wondering how might I use e.g. a shared disc or majority voting > in this situation ? In order to decide which fragment was the original > cluster and which was the piece that had broken off ? but then what > would the piece that had broken off do ? shutdown ? Wait to rejoin the cluster. Since it is not "the" cluster, it waits. It is not safe to make any updates. _How_ a groups decides it is "the" cluster can be done in several ways. Shared-disk cluster can do by a locking operation on a disk (I would have to research the details on this), a cluster with a database can get a lock from the database (and keep the connection open). And one way to do this in a shared-nothing cluster is to use a quorum of N/2 + 1, where is the maximum number of nodes. Clearly it has to be the majority or else you can have a split-brain cluster. > Do you think that we need to worry about situations where a piece of > state has more than one client, so a network partition may result in two > copies diverging in different and incompatible directions, rather than > only one diverging. If you use a quorum or quorum-resource as above you do not have this problem. You can turn down the requests or let them block until the cluster re-discovers the 'failed' nodes. > I can imagine this happening in an Entity Bean (but > we should be able to use the DB to resolve this) or an application POJO. > I haven't considered the latter case and it looks pretty hopeless to me, > unless you have some alternative route over which the two fragments can > communicate... but then, if you did, would you not pair it with your > original network, so that the one failed over to the other or replicated > its activity, so that you never perceived a split in the first place ? > Is this a common solution, or do people use other mechanisms here ? I do believe that membership and quorum is all you need. Guglielmo
