On 2013-09-24T20:55:40, AvatarSmith <[email protected]> wrote:

> I'm having a bit of an issue under centos 6.4 x64. I have two duplcate
> hardware systems (raid arrays, 10G nics' etc) configured identically and
> drbd replication is working fine in the cluster between the two. When I
> started doing fail over testing, via power/communcations interruptions I
> found that I could not reliably shift the resources from cluster1 to
> cluster2 even though they are identical in every aspect. I AM able (by
> starting pacemaker first on one or the other) to get the cluster up on
> either node. I was told that this is a problem for a non-stonith 2 node
> cluster and to add a third server to provide the quorum vote to tell the
> survivor to host the cluster resources.

The best bet, in my humble opinion, in this case is truly to setup a
quorum node - but not via a full member in the cluster.

This is not in CentOS, but if you use "sbd" as a fencing mechanism and
enable the pacemaker integration (see
https://github.com/l-mb/sbd/blob/master/man/sbd.8.pod), that allows you
to use an iSCSI target as a "quorum node".

(Think of it as a quorum node implemented on top of a standard storage
protocol.)

That means you can either install a small iSCSI server somewhere (easily
done under Linux, export a 1MB LUN from a VM or something), or utilize
existing storage servers to provide that.

I don't currently have a build for CentOS, but I'd welcome patches to
the specfile to make it work there ;-)

> hardware) I still get messages to the effect that the p_drbd_r0 monitor wont
> load becasue its not installed....well duh, its not installed but its not
> suppose to be running on node3 anyway why is trying to monitor on a node its
> not installed or permitted to run on ? 

monitor_0 is the initial probe that makes sure that the service is
really not running where the service forbids it to run. This is expected
behaviour.


> flags can I toggle to help point out the way or does PCS / crm_xxxxx provide
> a better interface for configuring/debugging this ? 

This behaviour has nothing to do how you configure pacemaker, but is
core to pacemaker itself. Though I think that later versions may have
learned (or will learn) to hide ERR_INSTALLED results in crm_mon if
there's a -inf location rule, the backend remains the same.

> Lastly, during my failover testing and configuration testing, I found the
> only surefire way to apply a new cluster config, is to cibadmin -f -E and
> cut and paste in a new one followed by a reboot....what a pain. You can
> sometimes get away with restarting pacemaker on all nodes, bringing up  your
> intended primary first then the others later

This clearly isn't good and worth debugging. Changes to the CIB ought to
take effect immediately without a restart.



Regards,
    Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to