Re: [Linux-cluster] Working of a two-node cluster

Jatin Davey Mon, 27 Apr 2015 01:24:43 -0700

On 4/27/2015 1:28 PM, Vasil Valchev wrote:

Hi,
I would advise you to use quorum disk _only_ as a last resort - it'sbetter to first get a solid understanding of the clustering solutionbefore adding additional complexity.An amazingly thorough and well described tutorial you can find here:https://alteeve.ca/w/AN!Cluster_Tutorial_2<https://alteeve.ca/w/AN%21Cluster_Tutorial_2>

[Jatin] Thank you very much for sharing this tutorial. I will surely gothrough it and gain more understanding.

Especially useful are the first chapters - the theory.
What I suspect is happening in your case is that your clustercommunication and fencing are over the same network, which is notfault tolerant.

[Jatin]

My cluster communication happens over one network while fencing happensover other network. I use two seperate vlans for this purpose. Secondlywhen the cluster communication fails due to network outage then fencinghappens over the other vlan and both the nodes get fenced.

So what happens if this network fails? Your 2 nodes can't see eachother, so they send fence requests, but the fence devices areunreachable too, so those requests fail.They are retried a few times I think, but if all fail, the fence agentreturns failed and your cluster is stuck in "recovering" or stopped state.Other times the network outage is shorter and the fence succeeds,resulting in both nodes going down - this is solved with the delayparameter.The first issue is architectural one, it is the expected behavior ofthe cluster to stop (or "freeze") all resources if it can't guaranteethe state of all members.
Read the article above it's really very useful.

Cheers!


Thanks
Jatin

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] Working of a two-node cluster

Reply via email to