Hi, I have 5 nodes cluster. 2 of the nodes are located in geographically separated site. Those two are passive and waiting for any failure to takeover the service. I've setup the cluster to stop the resources when nodes do not have quorum.
If the network between those two sites fails the subcluster in the first site (3 nodes) has quorum so the resources continue to run there. The second subcluster (2 nodes) does not have quorum thus it does not start any resources. This behavior is exactly as I like it. Here comes the problem. If the entire site fails on disaster the second subcluster does not failover any resources because it does not have quorum again (2 nodes). I guess I need a Quorum server in this case, located in a third site. Then quorum server will not have connectivity to the first site so it will grant quorum to the second site subcluster. I'm I right? I've setup a quorum server following the steps on http://www.linux-ha.org/QuorumServerGuide. But when I start Heartbeat resources do not start anywhere. I see those errors in the logs: crmd: [9691]: info: crmd_ccm_msg_callback: Quorum lost after event=INVALID (id=4) crmd: [9691]: ERROR: do_ccm_update_cache: Plurality w/o Quorum (5/5 nodes) crmd: [9691]: info: ccm_event_detail: INVALID: trans=4, nodes=5, new=1, lost=0 n_idx=0, new_idx=5, old_idx=10 ... WARN: cluster_status: We do not have quorum - fencing and resource management disabled Running tcpdump shows there is a communication between DC and the Quorum server. I can only guess that the quorum server does not return any quorum notifications or they are invalid? What can be the problem in my case? Heartbeat v2.1.3 / quorumd 2.1.3 Certificates are OK - tested with openssl /etc/ha.d/quorumd.conf cluster mycluster version 2_0_8 interval 1000 timeout 5000 takeover 3000 giveup 2000 /etc/ha.d/ha.cf autojoin any crm yes logfacility local0 keepalive 2 warntime 10 deadtime 30 initdead 30 deadping 5 ucast eth0 10.191.21.31 ucast eth0 10.191.21.32 ucast eth0 10.191.21.33 ucast eth0 10.191.21.34 ucast eth0 10.191.21.35 node iscsi1.ha3.postpath.com node iscsi2.ha3.postpath.com node node1.ha3.postpath.com node node2.ha3.postpath.com node node3.ha3.postpath.com debug 0 cluster mycluster quorum_server quorumsrv quorum server logs: quorumd: [6944]: info: G_main_add_SignalHandler: Added signal handler for signal 15 quorumd: [6944]: info: G_main_add_SignalHandler: Added signal handler for signal 10 quorumd: [6944]: info: G_main_add_SignalHandler: Added signal handler for signal 12 quorumd: [6944]: info: G_main_add_SignalHandler: Added signal handler for signal 1 quorumd: [6944]: info: Started. quorumd: [6944]: info: load config file /etc/ha.d/quorumd.conf (thats all) Please help me solve my problem! Any additional info how to deal with DR cluster would help. I cannot find much info for Heartbeat disaster recovery capabilities. How can I ping nodes and change the quorum depending on ping connectivity? Best regards, Atanas Dyulgerov _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
