Thank you Lars, I'll take it under consideration for a iscsi bridge/fencing mechanism, and see where I can get to.

On the last part "Failover" reconfiguring, and restarting pacemaker, I'm not sure its fair to expect that crmsh/pacemaker do act accordingly, for instance, I have raw primitives, add them to a group, then decide to move them to a different group (subject to load order) I cant just remove it from the group and put it in a different one, I have to remove it first, commit the change, the add it to the new group, and commit the change again. It seems the backend is way too smart, and attempts to shutdown the running config in the most minimal fashion possible before moving to the new config, if the elements of the old config are not present in the new config, you're stuck (aka remove a group) as you now have to manually shut down the old resources so the new ones can take effect.

I dont know if my "fat-fingering" and rapid fire re-configuration constitutes a legitmate use case. I'm thinking of direct cib manipulation during these sessions (with the cluster fully down), but I really prefer the cleanliness of the crmsh, vs xml.


-Chuck
On 9/25/2013 12:22 AM, Lars Marowsky-Bree wrote:
On 2013-09-24T20:55:40, AvatarSmith <[email protected]> wrote:

I'm having a bit of an issue under centos 6.4 x64. I have two duplcate
hardware systems (raid arrays, 10G nics' etc) configured identically and
drbd replication is working fine in the cluster between the two. When I
started doing fail over testing, via power/communcations interruptions I
found that I could not reliably shift the resources from cluster1 to
cluster2 even though they are identical in every aspect. I AM able (by
starting pacemaker first on one or the other) to get the cluster up on
either node. I was told that this is a problem for a non-stonith 2 node
cluster and to add a third server to provide the quorum vote to tell the
survivor to host the cluster resources.
The best bet, in my humble opinion, in this case is truly to setup a
quorum node - but not via a full member in the cluster.

This is not in CentOS, but if you use "sbd" as a fencing mechanism and
enable the pacemaker integration (see
https://github.com/l-mb/sbd/blob/master/man/sbd.8.pod), that allows you
to use an iSCSI target as a "quorum node".

(Think of it as a quorum node implemented on top of a standard storage
protocol.)

That means you can either install a small iSCSI server somewhere (easily
done under Linux, export a 1MB LUN from a VM or something), or utilize
existing storage servers to provide that.

I don't currently have a build for CentOS, but I'd welcome patches to
the specfile to make it work there ;-)

hardware) I still get messages to the effect that the p_drbd_r0 monitor wont
load becasue its not installed....well duh, its not installed but its not
suppose to be running on node3 anyway why is trying to monitor on a node its
not installed or permitted to run on ?
monitor_0 is the initial probe that makes sure that the service is
really not running where the service forbids it to run. This is expected
behaviour.


flags can I toggle to help point out the way or does PCS / crm_xxxxx provide
a better interface for configuring/debugging this ?
This behaviour has nothing to do how you configure pacemaker, but is
core to pacemaker itself. Though I think that later versions may have
learned (or will learn) to hide ERR_INSTALLED results in crm_mon if
there's a -inf location rule, the backend remains the same.

Lastly, during my failover testing and configuration testing, I found the
only surefire way to apply a new cluster config, is to cibadmin -f -E and
cut and paste in a new one followed by a reboot....what a pain. You can
sometimes get away with restarting pacemaker on all nodes, bringing up  your
intended primary first then the others later
This clearly isn't good and worth debugging. Changes to the CIB ought to
take effect immediately without a restart.



Regards,
     Lars


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to