The problem you are overlooking is that without a reliable way to prevent split-brain, you cannot ensure that the services you are trying to make resilient will handle failure without resource clashes.

If you have a suggestion on how to make that viable, I'm sure it will be listened to. But I cannot see how you can logically prevent resource clash (or worse, in case of a shared file system, data corruption) without a reliable fencing method.

If all you want to do is fail over some floating IPs, then fair enough, you might be able to get away to some extent without fencing (you can always manually get into the nodes via their fixed IPs to rectify any issues). For for anything more complex, I don't see how you can make do without reliable fencing.

Gordan

On 18/06/2010 00:31, Jankowski, Chris wrote:
Jim,

You hit architectural limitation of Linux Cluster, which is specific to Linux 
Cluster design, which other clusters tend not to have.

Linux Cluster assumes that you will *always* be able to execute fencing of 
*all* other nodes.  In fact, this is a stated *prerequisite* for correct 
operation of the cluster.

This is all very well when you have two PCs under your desk and a power switch.

However, this model completely fails when any network more complex then a power 
switch is present. Your network fails and you have a partitioned cluster that 
cannot fence. It all gets stuck. From a practical, operational point of view of 
an IT this is a disaster worse then not having a cluster.

Having come to Linux Cluster with a TruCluster background, I always had a 
problem with the STONITH approach used by Linux Cluster. I deem it harmful. But 
I see no inclination anywhere in the Linux Cluster world to remove it.

I believe that there is a major philosophical chasm dividing the design stance between the Linux 
Cluster and others. The Linux Cluster seems to be saying "A node is the centre of the world 
and can control it".  Other clusters take the opposite stance: "A node is a part of the 
world, cannot control it and may have a very limited visibility of the world in some 
circuumstances."

Regards,

Chris Jankowski



-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of jimbob palmer
Sent: Friday, 18 June 2010 01:59
To: [email protected]
Subject: [Linux-cluster] qdisk WITHOUT fencing

Dear distinguished linux-cluster members!

I have two data centers linked by physical fibre. Everything goes over this 
physical route: everything.

I would like to setup a high availability nfs server with drbd:
* drbd to replicate storage
* nfsd running
* floating ip

If the physical link between the two data centers is lost, I would like the 
primary data center to win.

I've setup a qdisk, and this works well: the node which can access the qdisk 
wins. i.e. the primary datacenter, which is the data center where the san 
holding the qdisk also lives, wins.

Unfortunately for me, I get pages and pages of errors about being unable to 
fence the secondary node.

The docs tell me that I absolutely must use power fencing, but in this case fencing makes 
no sense: it won't work when the link between the data centers is severed. The network, 
and the qdisk is the decider for who "wins".

So what should I do?

Many thanks in advance.

--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to