And fnally my ha.cf [EMAIL PROTECTED] ha.d]# egrep -v "^#|^$" ha.cf keepalive 2 deadtime 30 warntime 10 initdead 120 udpport 695 bcast eth0 eth2 # Linux auto_failback off node dtbaims node itbaims debug 1 use_logd yes conn_logd_time 60 compression bz2 crm respawn
IS there anything which can be added to this to ensure the failover on complete powerloss to the primary server? -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alex Strachan Sent: Wednesday, 29 October 2008 4:52 PM To: 'General Linux-HA mailing list' Subject: RE: [Linux-HA] Stonith, 2 node cluster - on loss of powertoprimarynode; failure to secondary didn't happen. It looks like it may be possible to power the card via a separate power adapter - this still doesn't help in the case of a complete power failure. The stonith seems to be working fine. I have a Filesystem resource set to 'fence' on failure' I triggered this to happen on the primary server and stonith kicked in from the secondary and reset the primary, then started running the resources - fantastic! Hmmm - only leaves how to recover from a complete powerloss where the RSA card is not available. I have attached my cib. Any pointers would be great. Thanks Alex -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alex Strachan Sent: Wednesday, 29 October 2008 4:07 PM To: 'General Linux-HA mailing list' Subject: RE: [Linux-HA] Stonith, 2 node cluster - on loss of power toprimarynode; failure to secondary didn't happen. When power was restored the resources restarted on the X primary dtbaims. Last error from crm_verify - [EMAIL PROTECTED] ~]# crm_verify -L -V crm_verify[31741]: 2008/10/29_16:00:20 notice: main: Required feature set: 2.0 crm_verify[31741]: 2008/10/29_16:00:20 WARN: main: Your configuration was internally updated to the latest version (pacemaker-1.0) crm_verify[31741]: 2008/10/29_16:00:20 notice: unpack_config: On loss of CCM Quorum: Ignore crm_verify[31741]: 2008/10/29_16:00:20 WARN: unpack_rsc_op: Processing failed op r_stonith-dtbaims_start_0 on itbaims: Error crm_verify[31741]: 2008/10/29_16:00:20 WARN: unpack_rsc_op: Compatibility handling for failed op r_stonith-dtbaims_start_0 on itbaims crm_verify[31741]: 2008/10/29_16:00:20 WARN: native_color: Resource r_stonith-dtbaims cannot run anywhere Is the " WARN: unpack_rsc_op: Compatibility handling for failed op r_stonith-dtbaims_start_0 on itbaims" indicative of a more serious error in configuration. My stonith cib config is.. <primitive id="r_stonith-dtbaims" class="stonith" type="external/ibmrsa-telnet"> <operations> <op name="monitor" interval="60" id="r_stonith-dtbaims-mon" timeout="300" requires="nothing"/> <op name="start" interval="0" id="r_stonith-dtbaims-start" timeout="180"/> <op name="stop" interval="0" id="r_stonith-dtbaims-stop" timeout="180"/> </operations> <instance_attributes id="instance_attributes.id49828"> <nvpair id="nvpair.id49835" name="nodename" value="dtbaims"/> <nvpair id="nvpair.id49844" name="ip_address" value="192.168.201.37"/> <nvpair id="nvpair.id49853" name="username" value="########"/> <nvpair id="nvpair.id49862" name="password" value="########"/> </instance_attributes> <meta_attributes id="primitive-r_stonith-dtbaims.meta"> <nvpair id="resource_stickiness.meta.auto-7" name="resource-stickiness" value="INFINITY"/> </meta_attributes> </primitive> <primitive id="r_stonith-itbaims" class="stonith" type="external/ibmrsa-telnet"> <operations> <op name="monitor" interval="60" id="r_stonith-itbaims-mon" timeout="300" requires="nothing"/> <op name="start" interval="0" id="r_stonith-itbaims-start" timeout="180"/> <op name="stop" interval="0" id="r_stonith-itbaims-stop" timeout="180"/> </operations> <instance_attributes id="instance_attributes.id49921"> <nvpair id="nvpair.id49928" name="nodename" value="itbaims"/> <nvpair id="nvpair.id49937" name="ip_address" value="192.168.201.38"/> <nvpair id="nvpair.id49946" name="username" value="########"/> <nvpair id="nvpair.id49955" name="password" value="########"/> </instance_attributes> <meta_attributes id="primitive-r_stonith-itbaims.meta"> <nvpair id="resource_stickiness.meta.auto-33" name="resource-stickiness" value="INFINITY"/> </meta_attributes> </primitive> And constraints ( I use a non-symmetrical cluster) <rsc_location id="r_stonith-dtbaims_hates_dtbaims" rsc="r_stonith-dtbaims"> <rule id="r_stonith-dtbaims_hates_dtbaims_rule" score="-INFINITY"> <expression attribute="#uname" id="expression.id49985" operation="eq" value="dtbaims"/> </rule> </rsc_location> <rsc_location id="r_stonith-dtbaims_loves_itbaims" rsc="r_stonith-dtbaims"> <rule id="r_stonith-dtbaims_loves_itbaims_rule" score="INFINITY"> <expression attribute="#uname" id="expression.id50013" operation="eq" value="itbaims"/> </rule> </rsc_location> <rsc_location id="r_stonith-itbaims_hates_itbaims" rsc="r_stonith-itbaims"> <rule id="r_stonith-itbaims_hates_itbaims_rule" score="-INFINITY"> <expression attribute="#uname" id="expression.id50012" operation="eq" value="itbaims"/> </rule> </rsc_location> <rsc_location id="r_stonith-itbaims_loves_dtbaims" rsc="r_stonith-itbaims"> <rule id="r_stonith-itbaims_loves_dtbaims_rule" score="INFINITY"> <expression attribute="#uname" id="expression.id50014" operation="eq" value="dtbaims"/> </rule> </rsc_location> -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alex Strachan Sent: Wednesday, 29 October 2008 3:26 PM To: 'General Linux-HA mailing list' Subject: [Linux-HA] Stonith, 2 node cluster - on loss of power to primarynode; failure to secondary didn't happen. Finally configured Stonith for an HA cluster - believe me doing this made me happy! Versions - heartbeat 2.99.1, pacemaker 1.0, redhat 4 x86_64 I have two nodes, dtbaims, itbaims. Stonith device ibmrsa-telnet is being used; failover is fine when doing a reset via the RSA card. Complete loss of power seems to be an issue. The RSA card is powered via the host. Status - dtbaims is primary (DRBD) and running all of the resources. itbaims is secondary Status before power loss.. ============ Last updated: Wed Oct 29 14:43:15 2008 Current DC: dtbaims (4f1614ac-d465-49db-b847-bac60f9dac6c) 2 Nodes configured. 3 Resources configured. ============ Node: dtbaims (4f1614ac-d465-49db-b847-bac60f9dac6c): online Node: itbaims (96595e56-e3db-42da-b13b-1e2d3a956529): online Full list of resources: Resource Group: group_its resource_its_drbd (heartbeat:its_drbddisk): Started dtbaims resource_its_fs (ocf::heartbeat:its_Filesystem): Started dtbaims resource_its_vip (ocf::heartbeat:IPaddr): Started dtbaims resource_its_oracle (ocf::heartbeat:its_oracle): Started dtbaims resource_its_oralsnr (ocf::heartbeat:its_oralsnr): Started dtbaims resource_its_aims (lsb:its_aims): Started dtbaims resource_its_apache (ocf::heartbeat:its_apache): Started dtbaims resource_its_smb (lsb:its_smb): Started dtbaims resource_its_dhcpd (lsb:its_dhcpd): Started dtbaims r_stonith-dtbaims (stonith:external/ibmrsa-telnet): Started itbaims r_stonith-itbaims (stonith:external/ibmrsa-telnet): Started dtbaims Migration summary:: * Node itbaims: * Node dtbaims: Status after powerloss - (on the secondary host) My expectation was DC would be transferred to itbaims (this was done), resources would start on itbaims (not done?) It looks like HA is waiting on completing the Stonith action. [EMAIL PROTECTED] ~]# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.2.6 (api:88/proto:86-88) GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by [EMAIL PROTECTED], 2008-06-04 16:15:48 m:res cs st ds p mounted fstype 0:r0 WFConnection Secondary/Unknown UpToDate/DUnknown C ============ Last updated: Wed Oct 29 15:11:48 2008 Current DC: itbaims (96595e56-e3db-42da-b13b-1e2d3a956529) 2 Nodes configured. 3 Resources configured. ============ Node: dtbaims (4f1614ac-d465-49db-b847-bac60f9dac6c): OFFLINE Node: itbaims (96595e56-e3db-42da-b13b-1e2d3a956529): online Full list of resources: Resource Group: group_its resource_its_drbd (heartbeat:its_drbddisk): Started dtbaims resource_its_fs (ocf::heartbeat:its_Filesystem): Started dtbaims resource_its_vip (ocf::heartbeat:IPaddr): Started dtbaims resource_its_oracle (ocf::heartbeat:its_oracle): Started dtbaims resource_its_oralsnr (ocf::heartbeat:its_oralsnr): Started dtbaims resource_its_aims (lsb:its_aims): Started dtbaims resource_its_apache (ocf::heartbeat:its_apache): Started dtbaims resource_its_smb (lsb:its_smb): Started dtbaims resource_its_dhcpd (lsb:its_dhcpd): Started dtbaims r_stonith-dtbaims (stonith:external/ibmrsa-telnet): Started itbaims FAILED r_stonith-itbaims (stonith:external/ibmrsa-telnet): Started dtbaims Migration summary:: * Node itbaims: r_stonith-dtbaims: migration-threshold=0 fail-count=1000000 Failed actions: r_stonith-dtbaims_monitor_60000 (node=itbaims, call=14, rc=14): complete r_stonith-dtbaims_start_0 (node=itbaims, call=17, rc=1): complete The HA cluster doesn't start the resources until power is restored to the X primary host. Running crm_verify -L -V just shows lots of crm_verify[31645]: 2008/10/29_15:19:47 notice: NoRoleChange: Move resource resource_its_dhcpd (Started dtbaims -> itbaims) crm_verify[31645]: 2008/10/29_15:19:47 notice: StopRsc: dtbaims Stop resource_its_dhcpd crm_verify[31645]: 2008/10/29_15:19:47 notice: StartRsc: itbaims Start resource_its_dhcpd crm_verify[31645]: 2008/10/29_15:19:47 notice: RecurringOp: Start recurring monitor (360s) for resource_its_dhcpd on itbaims crm_verify[31645]: 2008/10/29_15:19:47 info: native_stop_constraints: r_stonith-itbaims_stop_0 is implicit after dtbaims is fenced It looks like it wants to start the resources but waiting to clear the failed op. What can I do to ensure that the failover occurs in the result of a complete power loss to the primary host? _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
