In the logs, on ha2, I see at the time crm node standby ha1: May 18 10:32:54 ha2.iohost.com cib: [2378]: info: write_cib_contents: Archived previous version as /var/lib/heartbeat/crm/cib-25.raw May 18 10:32:54 ha2.iohost.com cib: [2378]: info: write_cib_contents: Wrote version 0.102.0 of the CIB to disk (digest: b445d9afde4b209981c3da08d4c24ecc) May 18 10:32:54 ha2.iohost.com cib: [2378]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.f5FXZH (digest: /var/lib/heartbeat/crm/cib.irSIZ7) May 18 10:32:54 ha2.iohost.com cib: [7779]: info: Managed write_cib_contents process 2378 exited with return code 0. May 18 10:33:11 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback: flush message from ha1.iohost.com May 18 10:33:12 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback: flush message from ha1.iohost.com May 18 10:33:12 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback: flush message from ha1.iohost.com May 18 10:33:12 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback: flush message from ha1.iohost.com May 18 10:35:14 ha2.iohost.com cib: [7779]: info: cib_stats: Processed 48 operations (13125.00us average, 0% utilization) in the last 10min
And on ha1: May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: cib:diff: - <cib admin_epoch="0" epoch="101" num_updates="23" > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: cib:diff: - <configuration > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: cib:diff: - <nodes > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: cib:diff: - <node id="b159178d-c19b-4473-aa8e-13e487b65e33" > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: cib:diff: - <instance_attributes id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33" > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: cib:diff: - <nvpair value="off" id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33-standby" /> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: cib:diff: - </instance_attributes> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: cib:diff: - </node> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: cib:diff: - </nodes> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: cib:diff: - </configuration> May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: abort_transition_graph: need_abort:59 - Triggered transition abort (complete=1) : Non-status change May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: cib:diff: - </cib> May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: need_abort: Aborting on change to admin_epoch May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: cib:diff: + <cib admin_epoch="0" epoch="102" num_updates="1" > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: cib:diff: + <configuration > May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: cib:diff: + <nodes > May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: do_state_transition: All 2 cluster nodes are eligible to run resources. May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: cib:diff: + <node id="b159178d-c19b-4473-aa8e-13e487b65e33" > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: cib:diff: + <instance_attributes id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33" > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: cib:diff: + <nvpair value="on" id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33-standby" /> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: cib:diff: + </instance_attributes> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: cib:diff: + </node> May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: do_pe_invoke: Query 337: Requesting the current CIB: S_POLICY_ENGINE May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: cib:diff: + </nodes> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: cib:diff: + </configuration> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: cib:diff: + </cib> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crm_attribute/4, version=0.102.1): ok (rc=0) May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: do_pe_invoke_callback: Invoking the PE: query=337, ref=pe_calc-dc-1305696782-441, seq=2, quorate=1 May 17 22:33:02 ha1.iohost.com cib: [1591]: info: write_cib_contents: Archived previous version as /var/lib/heartbeat/crm/cib-27.raw May 17 22:33:02 ha1.iohost.com cib: [1591]: info: write_cib_contents: Wrote version 0.102.0 of the CIB to disk (digest: 6014929506b4b9e2eccb8e741e6e2e6f) May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: unpack_config: On loss of CCM Quorum: Ignore May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 May 17 22:33:02 ha1.iohost.com cib: [1591]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.vRGjiM (digest: /var/lib/heartbeat/crm/cib.iJf2S7) May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: unpack_status: Node ha1.iohost.com is in standby-mode May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: determine_online_status: Node ha1.iohost.com is standby May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: unpack_status: Node ha2.iohost.com is in standby-mode May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: determine_online_status: Node ha2.iohost.com is standby May 17 22:33:02 ha1.iohost.com pengine: [8685]: WARN: unpack_status: Node ha1.iohost.com in status section no longer exists May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: unpack_rsc_op: Operation ip1arp_monitor_0 found resource ip1arp active on ha1.iohost.com May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: find_clone: Internally renamed drbd_webfs:1 on ha1.iohost.com to drbd_webfs:2 (ORPHAN) May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: find_clone: Internally renamed drbd_mysql:1 on ha1.iohost.com to drbd_mysql:2 (ORPHAN) May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: find_clone: Internally renamed drbd_mysql:0 on ha2.iohost.com to drbd_mysql:1 May 17 22:33:02 ha1.iohost.com cib: [8652]: info: Managed write_cib_contents process 1591 exited with return code 0. May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: unpack_rsc_op: Operation ip1arp_monitor_0 found resource ip1arp active on ha2.iohost.com May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: find_clone: Internally renamed drbd_webfs:0 on ha2.iohost.com to drbd_webfs:1 May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: unpack_status: Node ha1.iohost.com is unknown May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: group_print: Resource Group: WebServices May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: native_print: ip1 (ocf::heartbeat:IPaddr2): Started ha1.iohost.com May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: native_print: ip1arp (ocf::heartbeat:SendArp): Started ha1.iohost.com May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: native_print: fs_webfs (ocf::heartbeat:Filesystem): Started ha1.iohost.com May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: native_print: fs_mysql (ocf::heartbeat:Filesystem): Started ha1.iohost.com May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: native_print: apache2 (lsb:httpd): Started ha1.iohost.com May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: native_print: mysql (ocf::heartbeat:mysql): Started ha1.iohost.com May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: clone_print: Master/Slave Set: ms_drbd_mysql May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: short_print: Masters: [ ha1.iohost.com ] May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: short_print: Stopped: [ drbd_mysql:1 ] May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: clone_print: Master/Slave Set: ms_drbd_webfs May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: short_print: Masters: [ ha1.iohost.com ] May 17 22:33:02 ha1.iohost.com pengine: [8685]: notice: short_print: Stopped: [ drbd_webfs:1 ] May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: ip1arp: Rolling back scores from fs_webfs May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: ip1arp: Rolling back scores from ip1 May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: Resource ip1arp cannot run anywhere May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: ip1: Rolling back scores from apache2 May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: ip1: Rolling back scores from ip1arp May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: Resource ip1 cannot run anywhere May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: ms_drbd_webfs: Rolling back scores from apache2 May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: ms_drbd_webfs: Rolling back scores from fs_webfs May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: Resource drbd_webfs:0 cannot run anywhere May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: Resource drbd_webfs:1 cannot run anywhere May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: ms_drbd_webfs: Rolling back scores from apache2 May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: ms_drbd_webfs: Rolling back scores from fs_webfs May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: master_color: ms_drbd_webfs: Promoted 0 instances of a possible 1 to master May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: fs_webfs: Rolling back scores from fs_mysql May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: Resource fs_webfs cannot run anywhere May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: ms_drbd_mysql: Rolling back scores from apache2 May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: ms_drbd_mysql: Rolling back scores from fs_mysql May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: ms_drbd_mysql: Rolling back scores from fs_mysql May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: Resource drbd_mysql:0 cannot run anywhere May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: Resource drbd_mysql:1 cannot run anywhere May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: ms_drbd_mysql: Rolling back scores from apache2 On 5/17/2011 9:28 PM, Randy Katz wrote: > Hi, > > Relatively new to HA though I have been using Xen and reading > this list here and there, now need some help: > > I have 2 nodes, physical, let's call node1/node2: > In each I have VM's (Xen paravirt / ha1& ha2). In each VM I have > 2 LVs which are DRBD'd (r0 and r1, mysql data, and html data). There is a > VIP between them, resolving the website, which is a simple > Wordpress blog so to have database, and works well. > > When I start them (reboot VMs) they start up fine and ha1 is > online (primary) and ha2 is standby (secondary). If I: > > 1. crm node standby ha1.iohost.com - sometimes ha2.iohost.com takes > over, sometimes I am left with 2 nodes on standby, not > sure why. > 2. If 2 nodes are in standby and I issue: crm node online ha1.iohost.com > sometimes ha2 will become active as it should when ha1 > went standby, sometimes ha1 will become active and sometimes, they will > remain standby, not sure why. > > Question: How do I test and debug this? What parameters in which config > file affect this behavior? > > Thank you in advance, > Randy > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
