On 2013-09-24T20:55:40, AvatarSmith <[email protected]> wrote: > I'm having a bit of an issue under centos 6.4 x64. I have two duplcate > hardware systems (raid arrays, 10G nics' etc) configured identically and > drbd replication is working fine in the cluster between the two. When I > started doing fail over testing, via power/communcations interruptions I > found that I could not reliably shift the resources from cluster1 to > cluster2 even though they are identical in every aspect. I AM able (by > starting pacemaker first on one or the other) to get the cluster up on > either node. I was told that this is a problem for a non-stonith 2 node > cluster and to add a third server to provide the quorum vote to tell the > survivor to host the cluster resources.
The best bet, in my humble opinion, in this case is truly to setup a quorum node - but not via a full member in the cluster. This is not in CentOS, but if you use "sbd" as a fencing mechanism and enable the pacemaker integration (see https://github.com/l-mb/sbd/blob/master/man/sbd.8.pod), that allows you to use an iSCSI target as a "quorum node". (Think of it as a quorum node implemented on top of a standard storage protocol.) That means you can either install a small iSCSI server somewhere (easily done under Linux, export a 1MB LUN from a VM or something), or utilize existing storage servers to provide that. I don't currently have a build for CentOS, but I'd welcome patches to the specfile to make it work there ;-) > hardware) I still get messages to the effect that the p_drbd_r0 monitor wont > load becasue its not installed....well duh, its not installed but its not > suppose to be running on node3 anyway why is trying to monitor on a node its > not installed or permitted to run on ? monitor_0 is the initial probe that makes sure that the service is really not running where the service forbids it to run. This is expected behaviour. > flags can I toggle to help point out the way or does PCS / crm_xxxxx provide > a better interface for configuring/debugging this ? This behaviour has nothing to do how you configure pacemaker, but is core to pacemaker itself. Though I think that later versions may have learned (or will learn) to hide ERR_INSTALLED results in crm_mon if there's a -inf location rule, the backend remains the same. > Lastly, during my failover testing and configuration testing, I found the > only surefire way to apply a new cluster config, is to cibadmin -f -E and > cut and paste in a new one followed by a reboot....what a pain. You can > sometimes get away with restarting pacemaker on all nodes, bringing up your > intended primary first then the others later This clearly isn't good and worth debugging. Changes to the CIB ought to take effect immediately without a restart. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
