Hi all, I have 5 node cluster. I've setup the cluster to stop the resources on each subcluster then machine does not have quorum. However I have 2 of my 5 nodes in geographically separated site. I need to find a way to distinguish between network failures between both sites and site failure on disaster. Currently if the network fails the subcluster in the fisrt site will continue to serve because it has quorum (3 nodes). The other site subcluster (2 node) will not start any resources - no quorum 2/5.
In case of disaster in the first site the second site machines will not take over as well (no quorum). Therefore I've setup a Quorum server located in a different site. In case of disaster in the first site the quorum server will grant quorum to the second site subcluster because it can connect to the left 2 nodes. Nice isn't it? I configured the quorum server as described in http://www.linux-ha.org/QuorumServerGuide. The certificates are valid and nodes connect to the quorum server. The problem is that having configured the nodes to use the quorum server my resources stopped everywhere. None of the nodes can run resources anymore. I see those in the logs: crmd: [9691]: info: crmd_ccm_msg_callback: Quorum lost after event=INVALID (id=4) crmd: [9691]: ERROR: do_ccm_update_cache: Plurality w/o Quorum (5/5 nodes) crmd: [9691]: info: ccm_event_detail: INVALID: trans=4, nodes=5, new=1, lost=0 n_idx=0, new_idx=5, old_idx=10 ... WARN: cluster_status: We do not have quorum - fencing and resource management disabled I can only guess that the quorum server does not return any quorum notifications or they are invalid? What can be the problem in my case? Any help is highly appreciated! Regards, Atanas _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
