For some strange reason, I cannot use rules in the same fashion that was described in that link. I have already attempted numerous times to follow the template, but it does not work. I will say that I followed the syntax that was given in the CRM CLI, and with a little experimentation, was able to get it working properly. Thanks for you help!
Andreas Kurz-2 wrote: > > > On Wed, March 11, 2009 18:19, Ethan Bannister wrote: >> > >> Hello, >> >> >> I have been working on a complete fail-over SAN for some time now and >> almost have everything working the way it should. However, there have >> been some drawbacks. I am using the most up to date version of Heartbeat >> and Pacemaker. I have been modifying and testing everything through the >> CRM >> CLI. First off, I have not done much testing past putting each machine >> into standby mode. Here is the topology of the fail-over system: >> http://www.nabble.com/file/p22460063/SAN.jpg >> >> >> And here is my configuration when I go into the CRM CLI: >> >> >> crm(live)configure# show >> >> primitive R_IP_Target ocf:heartbeat:IPaddr2 \ params ip="192.168.3.137" \ >> params nic="eth0" \ params iflabel="1" \ op monitor interval="30s" > primitive >> R_tgtd ocf:acs:tgtd \ >> op monitor interval="30s" primitive R_IP_Init ocf:heartbeat:IPaddr2 \ > params >> ip="192.168.3.133" \ params nic="eth0" \ params iflabel="1" \ op monitor >> interval="30s" primitive R_iscsi ocf:heartbeat:iscsi \ params >> target="target1.acsacc.com" \ params portal="192.168.3.137" \ op monitor >> interval="30s" \ op start interval="0" timeout="60s" primitive R_LVM >> ocf:heartbeat:LVM \ >> params volgrpname="VolGroup01" \ op monitor interval="30s" \ op start >> interval="0" timeout="60s" primitive R_Filesystem >> ocf:heartbeat:Filesystem >> \ >> params device="/dev/VolGroup01/LogVol00" \ params >> directory="/san_targets/www" \ params fstype="ext3" \ op monitor >> interval="30s" \ op start interval="0" timeout="60s" primitive R_NFS >> ocf:heartbeat:nfsserver \ >> params nfs_init_script="/etc/init.d/nfs" \ params >> nfs_notify_cmd="/sbin/rpc.statd" \ params >> nfs_shared_infodir="/san_targets/www/nfsinfo" \ params >> nfs_ip="192.168.3.133" \ op monitor interval="30s" primitive drbd0 >> ocf:heartbeat:drbd \ >> params drbd_resource="drbd0" \ op monitor interval="29s" role="Master" >> timeout="30s" \ op monitor interval="30s" role="Slave" timeout="30s" >> primitive drbd1 ocf:heartbeat:drbd \ params drbd_resource="drbd1" \ op >> monitor interval="29s" role="Master" timeout="30s" \ op monitor >> interval="30s" role="Slave" timeout="30s" primitive drbd2 >> ocf:heartbeat:drbd \ >> params drbd_resource="drbd2" \ op monitor interval="29s" role="Master" >> timeout="30s" \ op monitor interval="30s" role="Slave" timeout="30s" >> primitive R_pingd ocf:pacemaker:pingd group G_Target R_IP_Target R_tgtd \ >> meta target-role="Started" group G_Init R_IP_Init R_iscsi R_LVM >> R_Filesystem R_NFS \ >> meta target-role="Started" ms ms-drbd0 drbd0 \ meta clone-max="2" >> notify="true" globally-unique="false" target-role="Started" ms ms-drbd1 >> drbd1 \ meta clone-max="2" notify="true" globally-unique="false" >> target-role="Started" ms ms-drbd2 drbd2 \ meta clone-max="2" >> notify="true" >> globally-unique="false" target-role="Started" clone pingd R_pingd \ meta >> target-role="Started" location ms-drbd0-pref-1 ms-drbd0 200: >> san1.acsacc.com location ms-drbd0-pref-2 ms-drbd0 100: san2.acsacc.com >> location ms-drbd1-pref-1 ms-drbd1 200: san1.acsacc.com location >> ms-drbd1-pref-2 ms-drbd1 100: san2.acsacc.com location ms-drbd2-pref-1 >> ms-drbd2 200: san1.acsacc.com location ms-drbd2-pref-2 ms-drbd2 100: >> san2.acsacc.com location G_Target-pref-1 G_Target 200: san1.acsacc.com >> location G_Target-pref-2 G_Target 100: san2.acsacc.com location >> G_Init-pref-1 G_Init 200: init1.acsacc.com >> location G_Init-pref-2 G_Init 100: init2.acsacc.com location >> ms-drbd0-not-on-1 ms-drbd0 -inf: init1.acsacc.com location >> ms-drbd0-not-on-2 ms-drbd0 -inf: init2.acsacc.com location >> ms-drbd1-not-on-1 ms-drbd1 -inf: init1.acsacc.com location >> ms-drbd1-not-on-2 ms-drbd1 -inf: init2.acsacc.com location >> ms-drbd2-not-on-1 ms-drbd2 -inf: init1.acsacc.com location >> ms-drbd2-not-on-2 ms-drbd2 -inf: init2.acsacc.com location >> G_Target-not-on-1 G_Target -inf: init1.acsacc.com >> location G_Target-not-on-2 G_Target -inf: init2.acsacc.com location >> G_Init-not-on-1 G_Init -inf: san1.acsacc.com >> location G_Init-not-on-2 G_Init -inf: san2.acsacc.com location >> pingd-node-1 >> pingd 500: init1.acsacc.com location pingd-node-2 pingd 500: >> init2.acsacc.com location pingd-node-3 pingd 500: san1.acsacc.com >> location >> pingd-node-4 pingd 500: san2.acsacc.com property >> $id="cib-bootstrap-options" \ >> dc-version="1.0.2-c02b459053bfa44d509a2a0e0247b291d93662b7" \ >> stonith-enabled="false" \ stonith-action="reboot" \ >> stop-orphan-resources="true" \ stop-orphan-actions="true" \ >> symmetric-cluster="false" \ last-lrm-refresh="1236720670" >> >> >> I have three drbd devices that are set up to replicate between the two >> targets (san1 & san2) and need to fail-over quickly. For the most part, >> they do. However, I think my constraints need some adjustment in order >> for drbd to promote on the other machine, as well as demote the machine >> that was just placed into standby. And to fix a few more issues as well. >> This is >> what happens when I put each preferred machine into standby mode: >> >> Init1: >> -Switches over to init2 with no issues, flawless and quick >> -When init1 is placed back into online mode, the resources begin to >> switch >> back to init1, but fail while attempting to start the LVM (R_LVM) >> resource. Resources then revert back to init2. I can get all the >> resources to switch back over to init1, but that requires init2 to be >> placed into standby mode and a cleanup of R_LVM on init1. And even that >> may not work and may require some fixing elsewhere. -After fixing the >> last >> issue by hand, I attempted to place init1 back into standby mode to test >> again. This time, R_LVM came back up with no issues, but R_NFS failed >> and >> than all resources were placed back on to init2 like the first test. >> After applying a cleanup to R_NFS, I notice in crm_mon that >> it tries to start on san1 and san2! Looking at my constraints, I don't >> see why it would try to do that. I cannot seem to place all the >> resources >> back on to init1 after this point. This usually means that I would need >> to take the system (as a whole) down to correct the situation. Which >> obviously, cannot happen. >> >> San1: >> -If I place san1 into standby mode, everything fails. It attempts to >> switch san2 to master for the drbd devices, and san1 to slave, but fails, >> thus also stopping the R_NFS, R_Filesystem and R_LVM resources on the >> initiator. >> >> Are there some things that I am missing in my configuration that will >> remedy this? I was thinking that a delay of some sort would need to be >> given for each resource that is effected by the node change. >> Unfortunately, I cannot >> find any good documentation on how to do this in the CRM CLI. Also, >> could >> someone please take a look at my constraints? I have a feeling that >> most of my problems lay within the constraints and if anything sticks >> out, >> it would be great to know :-D >> >> Any help would be greatly appreciated! > > Have a look at http://www.clusterlabs.org/wiki/DRBD_HowTo_1.0 ... watch > out for (role) Master and (action) promote > > Regards, > Andreas > > -- > : Andreas Kurz > : LINBIT | Your Way to High Availability > : Tel +43-1-8178292-64, Fax +43-1-8178292-82 > : > : http://www.linbit.com > > DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. > > This e-mail is solely for use by the intended recipient(s). Information > contained in this e-mail and its attachments may be confidential, > privileged or copyrighted. If you are not the intended recipient you are > hereby formally notified that any use, copying, disclosure or > distribution of the contents of this e-mail, in whole or in part, is > prohibited. Also please notify immediately the sender by return e-mail > and delete this e-mail from your system. Thank you for your co-operation. > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > > -- View this message in context: http://www.nabble.com/Fencing-trouble%21--Need-some-help%21-tp22460063p22565980.html Sent from the Linux-HA mailing list archive at Nabble.com. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
