Re: [Linux-HA] Fencing trouble! Need some help!

Ethan Bannister Tue, 17 Mar 2009 12:18:05 -0700

For some strange reason, I cannot use rules in the same fashion that was
described in that link.  I have already attempted numerous times to follow
the template, but it does not work.  I will say that I followed the syntax
that was given in the CRM CLI, and with a little experimentation, was able
to get it working properly.  Thanks for you help!



Andreas Kurz-2 wrote:
> 
> 
> On Wed, March 11, 2009 18:19, Ethan Bannister wrote:
>>
> 
>> Hello,
>>
>>
>> I have been working on a complete fail-over SAN for some time now and
>> almost have everything working the way it should.  However, there have
>> been some drawbacks.  I am using the most up to date version of Heartbeat
>> and Pacemaker.  I have been modifying and testing everything through the
>> CRM
>> CLI.  First off, I have not done much testing past putting each machine
>> into standby mode.  Here is the topology of the fail-over system:
>> http://www.nabble.com/file/p22460063/SAN.jpg
>>
>>
>> And here is my configuration when I go into the CRM CLI:
>>
>>
>> crm(live)configure# show
>>
>> primitive R_IP_Target ocf:heartbeat:IPaddr2 \ params ip="192.168.3.137" \
>> params nic="eth0" \ params iflabel="1" \ op monitor interval="30s"
> primitive
>> R_tgtd ocf:acs:tgtd \
>> op monitor interval="30s" primitive R_IP_Init ocf:heartbeat:IPaddr2 \
> params
>> ip="192.168.3.133" \ params nic="eth0" \ params iflabel="1" \ op monitor
>> interval="30s" primitive R_iscsi ocf:heartbeat:iscsi \ params
>> target="target1.acsacc.com" \ params portal="192.168.3.137" \ op monitor
>> interval="30s" \ op start interval="0" timeout="60s" primitive R_LVM
>> ocf:heartbeat:LVM \
>> params volgrpname="VolGroup01" \ op monitor interval="30s" \ op start
>> interval="0" timeout="60s" primitive R_Filesystem
>> ocf:heartbeat:Filesystem
>> \
>> params device="/dev/VolGroup01/LogVol00" \ params
>> directory="/san_targets/www" \ params fstype="ext3" \ op monitor
>> interval="30s" \ op start interval="0" timeout="60s" primitive R_NFS
>> ocf:heartbeat:nfsserver \
>> params nfs_init_script="/etc/init.d/nfs" \ params
>> nfs_notify_cmd="/sbin/rpc.statd" \ params
>> nfs_shared_infodir="/san_targets/www/nfsinfo" \ params
>> nfs_ip="192.168.3.133" \ op monitor interval="30s" primitive drbd0
>> ocf:heartbeat:drbd \
>> params drbd_resource="drbd0" \ op monitor interval="29s" role="Master"
>> timeout="30s" \ op monitor interval="30s" role="Slave" timeout="30s"
>> primitive drbd1 ocf:heartbeat:drbd \ params drbd_resource="drbd1" \ op
>> monitor interval="29s" role="Master" timeout="30s" \ op monitor
>> interval="30s" role="Slave" timeout="30s" primitive drbd2
>> ocf:heartbeat:drbd \
>> params drbd_resource="drbd2" \ op monitor interval="29s" role="Master"
>> timeout="30s" \ op monitor interval="30s" role="Slave" timeout="30s"
>> primitive R_pingd ocf:pacemaker:pingd group G_Target R_IP_Target R_tgtd \
>> meta target-role="Started" group G_Init R_IP_Init R_iscsi R_LVM
>> R_Filesystem R_NFS \
>> meta target-role="Started" ms ms-drbd0 drbd0 \ meta clone-max="2"
>> notify="true" globally-unique="false" target-role="Started" ms ms-drbd1
>> drbd1 \ meta clone-max="2" notify="true" globally-unique="false"
>> target-role="Started" ms ms-drbd2 drbd2 \ meta clone-max="2"
>> notify="true"
>> globally-unique="false" target-role="Started" clone pingd R_pingd \ meta
>> target-role="Started" location ms-drbd0-pref-1 ms-drbd0 200:
>> san1.acsacc.com location ms-drbd0-pref-2 ms-drbd0 100: san2.acsacc.com
>> location ms-drbd1-pref-1 ms-drbd1 200: san1.acsacc.com location
>> ms-drbd1-pref-2 ms-drbd1 100: san2.acsacc.com location ms-drbd2-pref-1
>> ms-drbd2 200: san1.acsacc.com location ms-drbd2-pref-2 ms-drbd2 100:
>> san2.acsacc.com location G_Target-pref-1 G_Target 200: san1.acsacc.com
>> location G_Target-pref-2 G_Target 100: san2.acsacc.com location
>> G_Init-pref-1 G_Init 200: init1.acsacc.com
>> location G_Init-pref-2 G_Init 100: init2.acsacc.com location
>> ms-drbd0-not-on-1 ms-drbd0 -inf: init1.acsacc.com location
>> ms-drbd0-not-on-2 ms-drbd0 -inf: init2.acsacc.com location
>> ms-drbd1-not-on-1 ms-drbd1 -inf: init1.acsacc.com location
>> ms-drbd1-not-on-2 ms-drbd1 -inf: init2.acsacc.com location
>> ms-drbd2-not-on-1 ms-drbd2 -inf: init1.acsacc.com location
>> ms-drbd2-not-on-2 ms-drbd2 -inf: init2.acsacc.com location
>> G_Target-not-on-1 G_Target -inf: init1.acsacc.com
>> location G_Target-not-on-2 G_Target -inf: init2.acsacc.com location
>> G_Init-not-on-1 G_Init -inf: san1.acsacc.com
>> location G_Init-not-on-2 G_Init -inf: san2.acsacc.com location
>> pingd-node-1
>> pingd 500: init1.acsacc.com location pingd-node-2 pingd 500:
>> init2.acsacc.com location pingd-node-3 pingd 500: san1.acsacc.com
>> location
>> pingd-node-4 pingd 500: san2.acsacc.com property
>> $id="cib-bootstrap-options" \
>> dc-version="1.0.2-c02b459053bfa44d509a2a0e0247b291d93662b7" \
>> stonith-enabled="false" \ stonith-action="reboot" \
>> stop-orphan-resources="true" \ stop-orphan-actions="true" \
>> symmetric-cluster="false" \ last-lrm-refresh="1236720670"
>>
>>
>> I have three drbd devices that are set up to replicate between the two
>> targets (san1 & san2) and need to fail-over quickly.  For the most part,
>> they do.  However, I think my constraints need some adjustment in order
>> for drbd to promote on the other machine, as well as demote the machine
>> that was just placed into standby.  And to fix a few more issues as well.
>> This is
>> what happens when I put each preferred machine into standby mode:
>>
>> Init1:
>> -Switches over to init2 with no issues, flawless and quick
>> -When init1 is placed back into online mode, the resources begin to
>> switch
>>  back to init1, but fail while attempting to start the LVM (R_LVM)
>> resource. Resources then revert back to init2.  I can get all the
>> resources to switch back over to init1, but that requires init2 to be
>> placed into standby mode and a cleanup of R_LVM on init1.  And even that
>> may not work and may require some fixing elsewhere. -After fixing the
>> last
>> issue by hand, I attempted to place init1 back into standby mode to test
>> again.  This time, R_LVM came back up with no issues, but R_NFS failed
>> and
>> than all resources were placed back on to init2 like the first test.
>> After applying a cleanup to R_NFS, I notice in crm_mon that
>> it tries to start on san1 and san2!  Looking at my constraints, I don't
>> see why it would try to do that.  I cannot seem to place all the
>> resources
>> back on to init1 after this point.  This usually means that I would need
>> to take the system (as a whole) down to correct the situation.  Which
>> obviously, cannot happen.
>>
>> San1:
>> -If I place san1 into standby mode, everything fails.  It attempts to
>> switch san2 to master for the drbd devices, and san1 to slave, but fails,
>> thus also stopping the R_NFS, R_Filesystem and R_LVM resources on the
>> initiator.
>>
>> Are there some things that I am missing in my configuration that will
>> remedy this?  I was thinking that a delay of some sort would need to be
>> given for each resource that is effected by the node change.
>> Unfortunately, I cannot
>> find any good documentation on how to do this in the CRM CLI.  Also,
>> could
>>  someone please take a look at my constraints?  I have a feeling that
>> most of my problems lay within the constraints and if anything sticks
>> out,
>> it would be great to know :-D
>>
>> Any help would be greatly appreciated!
> 
> Have a look at http://www.clusterlabs.org/wiki/DRBD_HowTo_1.0 ... watch
> out for (role) Master and (action) promote
> 
> Regards,
> Andreas
> 
> -- 
> : Andreas Kurz
> : LINBIT | Your Way to High Availability
> : Tel +43-1-8178292-64, Fax +43-1-8178292-82
> :
> : http://www.linbit.com
> 
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> 
> This e-mail is solely for use by the intended recipient(s). Information
> contained in this e-mail and its attachments may be confidential,
> privileged or copyrighted. If you are not the intended recipient you are
> hereby formally notified that any use, copying, disclosure or
> distribution of the contents of this e-mail, in whole or in part, is
> prohibited. Also please notify immediately the sender by return e-mail
> and delete this e-mail from your system. Thank you for your co-operation.
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Fencing-trouble%21--Need-some-help%21-tp22460063p22565980.html
Sent from the Linux-HA mailing list archive at Nabble.com.

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Fencing trouble! Need some help!

Reply via email to