[Linux-HA] Quorum server issue?

Atanas Dyulgerov Fri, 15 Feb 2008 08:34:56 -0800

Hi,

I have 5 nodes cluster. 2 of the nodes are located in geographically separated 
site. Those
two are passive and waiting for any failure to takeover the service. I've setup 
the cluster
to stop the resources when nodes do not have quorum.


If the network between those two sites fails the subcluster in the first site 
(3 nodes) has
 quorum so the resources continue to run there. The second subcluster (2 nodes) 
does not have
 quorum thus it does not start any resources. This behavior is exactly as I 
like it. 

Here comes the problem. If the entire site fails on disaster the second 
subcluster does not
 failover any resources because it does not have quorum again (2 nodes). I 
guess I need a 
Quorum server in this case, located in a third site. Then quorum server will 
not have 
connectivity to the first site so it will grant quorum to the second site 
subcluster. I'm I 
right? 

I've setup a quorum server following the steps on 
http://www.linux-ha.org/QuorumServerGuide.
 But when I start Heartbeat resources do not start anywhere. I see those errors 
in the logs: 

crmd: [9691]: info: crmd_ccm_msg_callback: Quorum lost after event=INVALID 
(id=4) 
crmd: [9691]: ERROR: do_ccm_update_cache: Plurality w/o Quorum (5/5 nodes) 
crmd: [9691]: info: ccm_event_detail: INVALID: trans=4, nodes=5, new=1, lost=0 
n_idx=0, new_idx=5, old_idx=10 
... 
WARN: cluster_status: We do not have quorum - fencing and resource management 
disabled 

Running tcpdump shows there is a communication between DC and the Quorum 
server. I can only
 guess that the quorum server does not return any quorum notifications or they 
are invalid?
 What can be the problem in my case? 

Heartbeat v2.1.3 / quorumd 2.1.3
Certificates are OK - tested with openssl 
/etc/ha.d/quorumd.conf 
cluster mycluster 
version 2_0_8 
interval 1000 
timeout 5000 
takeover 3000 
giveup 2000 

/etc/ha.d/ha.cf 
autojoin any 
crm yes 
logfacility local0 
keepalive 2 
warntime 10 
deadtime 30 
initdead 30 
deadping 5 
ucast eth0 10.191.21.31 
ucast eth0 10.191.21.32 
ucast eth0 10.191.21.33 
ucast eth0 10.191.21.34 
ucast eth0 10.191.21.35 
node iscsi1.ha3.postpath.com 
node iscsi2.ha3.postpath.com 
node node1.ha3.postpath.com 
node node2.ha3.postpath.com 
node node3.ha3.postpath.com 
debug 0 
cluster mycluster 
quorum_server quorumsrv 

quorum server logs: 

quorumd: [6944]: info: G_main_add_SignalHandler: Added signal handler for 
signal 15 
quorumd: [6944]: info: G_main_add_SignalHandler: Added signal handler for 
signal 10 
quorumd: [6944]: info: G_main_add_SignalHandler: Added signal handler for 
signal 12 
quorumd: [6944]: info: G_main_add_SignalHandler: Added signal handler for 
signal 1 
quorumd: [6944]: info: Started. 
quorumd: [6944]: info: load config file /etc/ha.d/quorumd.conf 
(thats all) 

Please help me solve my problem! Any additional info how to deal with DR 
cluster would help. I 
cannot find much info for Heartbeat disaster recovery capabilities. How can I 
ping nodes and 
change the quorum depending on ping connectivity? 


Best regards, 
Atanas Dyulgerov
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Quorum server issue?

Reply via email to