Hi,

On Mon, Feb 11, 2008 at 06:42:52PM +0200, Atanas Dyulgerov wrote:
> Hi all,
> 
> I have 5 node cluster. I've setup the cluster to stop the
> resources on each subcluster then machine does not have quorum.
> However I have 2 of my 5 nodes in geographically separated
> site. I need to find a way to distinguish between network
> failures between both sites and site failure on disaster.
> Currently if the network fails the subcluster in the fisrt site
> will continue to serve because it has quorum (3 nodes). The
> other site subcluster (2 node) will not start any resources -
> no quorum 2/5.
> 
> In case of disaster in the first site the second site machines
> will not take over as well (no quorum). Therefore I've setup a
> Quorum server located in a different site. In case of disaster
> in the first site the quorum server will grant quorum to the
> second site subcluster because it can connect to the left 2
> nodes.
> 
> Nice isn't it?
> 
> I configured the quorum server as described in
> http://www.linux-ha.org/QuorumServerGuide. The certificates are
> valid and nodes connect to the quorum server.  The problem is
> that having configured the nodes to use the quorum server my
> resources stopped everywhere. None of the nodes can run
> resources anymore. I see those in the logs:
> 
> crmd: [9691]: info: crmd_ccm_msg_callback: Quorum lost after event=INVALID 
> (id=4)
> crmd: [9691]: ERROR: do_ccm_update_cache: Plurality w/o Quorum (5/5 nodes)
> crmd: [9691]: info: ccm_event_detail: INVALID: trans=4, nodes=5, new=1, 
> lost=0 n_idx=0, new_idx=5, old_idx=10
> ...
> WARN: cluster_status: We do not have quorum - fencing and resource management 
> disabled
> 
> 
> I can only guess that the quorum server does not return any
> quorum notifications or they are invalid? What can be the
> problem in my case?

It's hard to say. Unfortunately, we're currently low on quorumd
resources, so to speak. So far, those who managed to configure it
reported that it works, but that there were problems
reconnecting. Search this list's archives.

Alternatively, you may try the pingd to manage resources in case
of connectivity problems.

> Any help is highly appreciated!

Sorry that I can't help more here.

Thanks,

Dejan

> 
> Regards,
> Atanas 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to