Re: [Linux-HA] 3 node cluster keeps failing after domU image is started

Joe Shang Sun, 27 Jun 2010 08:55:20 -0700

Saw some typos in my last config.


node fa1.box.com
node xen1.box.com
node xen2.box.com
primitive drbd_xen1 ocf:linbit:drbd \
        params drbd_resource="xen1" \
        op monitor interval="15s"
primitive drbd_xen2 ocf:linbit:drbd \
        params drbd_resource="xen2" \
        op monitor interval="15s"
primitive vm1xen1 ocf:heartbeat:Xen \
        params xmfile="/xen1/vm1" \
        op monitor interval="15s" timeout="120s" depth="0"
target-role="Stopped" \
        op stop interval="0s" timeout="120s" \
        op start interval="10s" timeout="60s" \
        meta is-managed="true" \
        meta target-role="Started"
primitive vm1xen2 ocf:heartbeat:Xen \
        params xmfile="/xen2/vm1" \
        op monitor interval="15s" timeout="120s" depth="0"
target-role="Stopped" \
        op stop interval="0s" timeout="120s" \
        op start interval="10s" timeout="60s" \
        meta is-managed="true" \
        meta target-role="Started"
primitive xen1_fs ocf:heartbeat:Filesystem \
        params device="/dev/drbd1" directory="/xen1" fstype="ext3" \
        op monitor start-delay="30s" interval="15s" \
        meta target-role="Started"
primitive xen2_fs ocf:heartbeat:Filesystem \
        params device="/dev/drbd2" directory="/xen2" fstype="ext3" \
        op monitor start-delay="30s" interval="15s" \
        meta target-role="Started"
ms ms_drbd_xen1 drbd_xen1 \
        meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
ms ms_drbd_xen2 drbd_xen2 \
        meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
location drbd-fence-by-handler-ms_drbd_xen1 ms_drbd_xen1 \
        rule $id="drbd-fence-by-handler-rule-ms_drbd_xen1"
$role="Master" -inf: #uname ne fa1.box.com
location drbd-fence-by-handler-ms_drbd_xen2 ms_drbd_xen2 \
        rule $id="drbd-fence-by-handler-rule-ms_drbd_xen2"
$role="Master" -inf: #uname ne xen2.box.com
colocation fs_on_drbd1 inf: xen1_fs ms_drbd_xen1:Master
colocation fs_on_drbd2 inf: xen2_fs ms_drbd_xen2:Master
colocation vm1xen1-with-xen1_fs inf: vm1xen1 xen1_fs
colocation vm1xen2-with-xen2_fs inf: vm1xen2 xen2_fs
order fs_after_drbd1 inf: ms_drbd_xen1:promote xen1_fs:start
order fs_after_drbd2 inf: ms_drbd_xen2:promote xen2_fs:start
order vm1xen1-after-xen1_fs inf: xen1_fs:start vm1xen1:start
order vm1xen2-after-xen2_fs inf: xen2_fs:start vm1xen2:start
property $id="cib-bootstrap-options" \
        dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="3" \
        no-quorum-policy="ignore" \
        stonith-enabled="false" \
        default-resource-stickiness="1000" \
        last-lrm-refresh="1277632456"


This is my new one, but still same behavior, been kinda thinking of
adding a sleep to the xm binary, as its showing the same behavior, but
just don't understand why other people don't face the same issue with
pacemaker, unless its my version of it?

[r...@xen2 ~]# rpm -qa|grep -i pace
pacemaker-libs-1.0.9.1-1.el5
pacemaker-1.0.9.1-1.el5
[r...@xen2 ~]# rpm -qa|grep -i coro
corosynclib-1.2.5-1.3.el5
corosync-1.2.5-1.3.el5


On Sun, Jun 27, 2010 at 7:57 AM, Joe Shang <[email protected]> wrote:
> Yeah I had that problem earlier. I tried the config you had, and these
> are what spits out in my error logs, xen1:
>
>
>
>
>
>
> Jun 27 10:51:48 xen1 crmd: [3952]: info: update_dc: Set DC to
> fa1.box.com (3.0.1)
> Jun 27 10:51:49 xen1 crmd: [3952]: info: update_attrd: Connecting to attrd...
> Jun 27 10:51:49 xen1 attrd: [3950]: info: find_hash_entry: Creating
> hash entry for terminate
> Jun 27 10:51:49 xen1 attrd: [3950]: info: find_hash_entry: Creating
> hash entry for shutdown
> Jun 27 10:51:49 xen1 crmd: [3952]: info: do_state_transition: State
> transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE
> origin=do_cl_join_finalize_respond ]
> Jun 27 10:51:49 xen1 attrd: [3950]: info: attrd_local_callback:
> Sending full refresh (origin=crmd)
> Jun 27 10:51:49 xen1 attrd: [3950]: info: attrd_trigger_update:
> Sending flush op to all hosts for: terminate (<null>)
> Jun 27 10:51:49 xen1 attrd: [3950]: info: attrd_trigger_update:
> Sending flush op to all hosts for: shutdown (<null>)
> Jun 27 10:51:49 xen1 attrd: [3950]: info: crm_new_peer: Node
> xen2.box.com now has id: 50397194
> Jun 27 10:51:49 xen1 attrd: [3950]: info: crm_new_peer: Node 50397194
> is now known as xen2.box.com
> Jun 27 10:51:49 xen1 crmd: [3952]: info: erase_xpath_callback:
> Deletion of "//node_sta...@uname='xen1.box.com']/transient_attributes":
> ok (rc=0)
> Jun 27 10:51:49 xen1 attrd: [3950]: info: crm_new_peer: Node
> fa1.box.com now has id: 16842762
> Jun 27 10:51:49 xen1 attrd: [3950]: info: crm_new_peer: Node 16842762
> is now known as fa1.box.com
> Jun 27 10:51:49 xen1 cib: [3963]: info: write_cib_contents: Archived
> previous version as /var/lib/heartbeat/crm/cib-62.raw
> Jun 27 10:51:49 xen1 cib: [3963]: info: write_cib_contents: Wrote
> version 0.79.0 of the CIB to disk (digest:
> 53669bbde5238efd1608e412ed8bdd37)
> Jun 27 10:51:49 xen1 cib: [3963]: info: retrieveCib: Reading cluster
> configuration from: /var/lib/heartbeat/crm/cib.FMAqCU (digest:
> /var/lib/heartbeat/crm/cib.ecNEdF)
> Jun 27 10:51:49 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=4:0:7:e8d1e361-2956-49c9-9c49-9babc795edc6
> op=drbd_xen1:0_monitor_0 )
> Jun 27 10:51:49 xen1 lrmd: [3949]: info: rsc:drbd_xen1:0:2: probe
> Jun 27 10:51:49 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=5:0:7:e8d1e361-2956-49c9-9c49-9babc795edc6
> op=drbd_xen2:1_monitor_0 )
> Jun 27 10:51:49 xen1 lrmd: [3949]: info: rsc:drbd_xen2:1:3: probe
> Jun 27 10:51:49 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=6:0:7:e8d1e361-2956-49c9-9c49-9babc795edc6 op=xen1_fs_monitor_0 )
> Jun 27 10:51:49 xen1 lrmd: [3949]: info: rsc:xen1_fs:4: probe
> Jun 27 10:51:49 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=7:0:7:e8d1e361-2956-49c9-9c49-9babc795edc6 op=xen2_fs_monitor_0 )
> Jun 27 10:51:49 xen1 lrmd: [3949]: info: rsc:xen2_fs:5: probe
> Jun 27 10:51:49 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=8:0:7:e8d1e361-2956-49c9-9c49-9babc795edc6 op=vm1xen1_monitor_0 )
> Jun 27 10:51:49 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen1:0:probe:stderr) DRBD module version: 8.3.8    userland
> version: 8.3.6 you should upgrade your drbd tools!
> Jun 27 10:51:49 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen2:1:probe:stderr) DRBD module version: 8.3.8    userland
> version: 8.3.6 you should upgrade your drbd tools!
> Jun 27 10:51:49 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen2:1:probe:stderr) 'xen2' not defined in your config.
> Jun 27 10:51:49 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation drbd_xen2:1_monitor_0 (call=3, rc=7, cib-update=8,
> confirmed=true) not running
> Jun 27 10:51:49 xen1 attrd: [3950]: info: find_hash_entry: Creating
> hash entry for master-drbd_xen2:0
> Jun 27 10:51:49 xen1 Filesystem[3971]: WARNING: Couldn't find device
> [/dev/drbd2]. Expected /dev/??? to exist
> Jun 27 10:51:49 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation xen1_fs_monitor_0 (call=4, rc=7, cib-update=9,
> confirmed=true) not running
> Jun 27 10:51:49 xen1 attrd: [3950]: info: find_hash_entry: Creating
> hash entry for master-drbd_xen1:0
> Jun 27 10:51:49 xen1 attrd: [3950]: info: attrd_trigger_update:
> Sending flush op to all hosts for: master-drbd_xen1:0 (10000)
> Jun 27 10:51:49 xen1 attrd: [3950]: info: attrd_perform_update: Sent
> update 13: master-drbd_xen1:0=10000
> Jun 27 10:51:49 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=9:0:7:e8d1e361-2956-49c9-9c49-9babc795edc6 op=vm1xen2_monitor_0 )
> Jun 27 10:51:49 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation drbd_xen1:0_monitor_0 (call=2, rc=0, cib-update=10,
> confirmed=true) ok
> Jun 27 10:51:49 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation xen2_fs_monitor_0 (call=5, rc=7, cib-update=11,
> confirmed=true) not running
> Jun 27 10:51:49 xen1 attrd: [3950]: info: find_hash_entry: Creating
> hash entry for master-drbd_xen1:1
> Jun 27 10:51:54 xen1 lrmd: [3949]: info: rsc:vm1xen1:6: probe
> Jun 27 10:51:54 xen1 lrmd: [3949]: info: rsc:vm1xen2:7: probe
> Jun 27 10:51:55 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation vm1xen1_monitor_0 (call=6, rc=7, cib-update=12,
> confirmed=true) not running
> Jun 27 10:51:55 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation vm1xen2_monitor_0 (call=7, rc=7, cib-update=13,
> confirmed=true) not running
> Jun 27 10:51:55 xen1 attrd: [3950]: info: find_hash_entry: Creating
> hash entry for probe_complete
> Jun 27 10:51:55 xen1 attrd: [3950]: info: attrd_trigger_update:
> Sending flush op to all hosts for: probe_complete (true)
> Jun 27 10:51:55 xen1 attrd: [3950]: info: attrd_perform_update: Sent
> update 18: probe_complete=true
> Jun 27 10:51:55 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=92:2:0:e8d1e361-2956-49c9-9c49-9babc795edc6
> op=drbd_xen1:0_notify_0 )
> Jun 27 10:51:55 xen1 lrmd: [3949]: info: rsc:drbd_xen1:0:8: notify
> Jun 27 10:51:55 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen1:0:notify:stderr) DRBD module version: 8.3.8    userland
> version: 8.3.6 you should upgrade your drbd tools!
> Jun 27 10:51:55 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation drbd_xen1:0_notify_0 (call=8, rc=0, cib-update=14,
> confirmed=true) ok
> Jun 27 10:51:55 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=9:2:0:e8d1e361-2956-49c9-9c49-9babc795edc6
> op=drbd_xen1:0_promote_0 )
> Jun 27 10:51:55 xen1 lrmd: [3949]: info: rsc:drbd_xen1:0:9: promote
> Jun 27 10:51:55 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen1:0:promote:stderr) DRBD module version: 8.3.8    userland
> version: 8.3.6 you should upgrade your drbd tools!
> Jun 27 10:51:55 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen1:0:promote:stderr) DRBD module version: 8.3.8    userland
> version: 8.3.6 you should upgrade your drbd tools!
> Jun 27 10:51:55 xen1 kernel: block drbd1: role( Secondary -> Primary )
> Jun 27 10:51:55 xen1 lrmd: [3949]: info: RA output: 
> (drbd_xen1:0:promote:stdout)
> Jun 27 10:51:55 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation drbd_xen1:0_promote_0 (call=9, rc=0, cib-update=15,
> confirmed=true) ok
> Jun 27 10:51:55 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=93:2:0:e8d1e361-2956-49c9-9c49-9babc795edc6
> op=drbd_xen1:0_notify_0 )
> Jun 27 10:51:55 xen1 lrmd: [3949]: info: rsc:drbd_xen1:0:10: notify
> Jun 27 10:51:55 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen1:0:notify:stderr) DRBD module version: 8.3.8    userland
> version: 8.3.6 you should upgrade your drbd tools!
> Jun 27 10:51:55 xen1 lrmd: [3949]: info: RA output: 
> (drbd_xen1:0:notify:stdout)
> Jun 27 10:51:55 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation drbd_xen1:0_notify_0 (call=10, rc=0, cib-update=16,
> confirmed=true) ok
> Jun 27 10:51:55 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=70:2:0:e8d1e361-2956-49c9-9c49-9babc795edc6 op=xen1_fs_start_0 )
> Jun 27 10:51:55 xen1 lrmd: [3949]: info: rsc:xen1_fs:11: start
> Jun 27 10:51:55 xen1 Filesystem[4224]: INFO: Running start for
> /dev/drbd1 on /xen1
> Jun 27 10:51:55 xen1 kernel: kjournald starting.  Commit interval 5 seconds
> Jun 27 10:51:55 xen1 kernel: EXT3 FS on drbd1, internal journal
> Jun 27 10:51:55 xen1 kernel: EXT3-fs: mounted filesystem with ordered data 
> mode.
> Jun 27 10:51:55 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation xen1_fs_start_0 (call=11, rc=0, cib-update=17,
> confirmed=true) ok
> Jun 27 10:51:55 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=71:2:0:e8d1e361-2956-49c9-9c49-9babc795edc6
> op=xen1_fs_monitor_10000 )
> Jun 27 10:51:55 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=74:2:0:e8d1e361-2956-49c9-9c49-9babc795edc6 op=vm1xen1_start_0 )
> Jun 27 10:51:55 xen1 lrmd: [3949]: info: rsc:vm1xen1:13: start
> Jun 27 10:51:56 xen1 lrmd: [3949]: info: RA output:
> (vm1xen1:start:stderr) Error: Unable to open config file: /xen1/vm1
> Jun 27 10:51:56 xen1 lrmd: [3949]: info: RA output:
> (vm1xen1:start:stdout) Usage: xm create <ConfigFile> [options] [vars]
> Create a domain based on <ConfigFile>.  Options:  -h, --help
> Print this help. --help_config        Print the available
> configuration variables (vars)                      for the
> configuration script. -q, --quiet          Quiet. --path=PATH
> Search path for configuration scripts. The value of
>  PATH is a colon-separated directory list. -f=FILE, --defconfig=FILE
>                    Use the given Python configuration script.The
>                configuration script is loaded after arguments have
>                  been processed. Each command-line option sets a
>                configuration variable named after its long option
>                 name, and these variables are placed in the
>           environment of the script before it is loaded.
>        Variables for options that may be repeated have list
> Jun 27 10:51:56 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation vm1xen1_start_0 (call=13, rc=1, cib-update=18,
> confirmed=true) unknown error
> Jun 27 10:51:56 xen1 attrd: [3950]: info: attrd_ais_dispatch: Update
> relayed from fa1.box.com
> Jun 27 10:51:56 xen1 attrd: [3950]: info: find_hash_entry: Creating
> hash entry for fail-count-vm1xen1
> Jun 27 10:51:56 xen1 attrd: [3950]: info: attrd_trigger_update:
> Sending flush op to all hosts for: fail-count-vm1xen1 (INFINITY)
> Jun 27 10:51:56 xen1 attrd: [3950]: info: attrd_perform_update: Sent
> update 25: fail-count-vm1xen1=INFINITY
> Jun 27 10:51:56 xen1 attrd: [3950]: info: attrd_ais_dispatch: Update
> relayed from fa1.box.com
> Jun 27 10:51:56 xen1 attrd: [3950]: info: find_hash_entry: Creating
> hash entry for last-failure-vm1xen1
> Jun 27 10:51:56 xen1 attrd: [3950]: info: attrd_trigger_update:
> Sending flush op to all hosts for: last-failure-vm1xen1 (1277650316)
> Jun 27 10:51:56 xen1 attrd: [3950]: info: attrd_perform_update: Sent
> update 28: last-failure-vm1xen1=1277650316
> Jun 27 10:52:25 xen1 lrmd: [3949]: info: rsc:xen1_fs:12: monitor
> Jun 27 10:52:25 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation xen1_fs_monitor_10000 (call=12, rc=0, cib-update=19,
> confirmed=false) ok
> Jun 27 10:52:25 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=2:3:0:e8d1e361-2956-49c9-9c49-9babc795edc6 op=vm1xen1_stop_0 )
> Jun 27 10:52:25 xen1 lrmd: [3949]: info: rsc:vm1xen1:14: stop
> Jun 27 10:52:26 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=96:3:0:e8d1e361-2956-49c9-9c49-9babc795edc6
> op=drbd_xen1:0_notify_0 )
> Jun 27 10:52:26 xen1 lrmd: [3949]: info: rsc:drbd_xen1:0:15: notify
> Jun 27 10:52:26 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen1:0:notify:stderr) DRBD module version: 8.3.8    userland
> version: 8.3.6 you should upgrade your drbd tools!
> Jun 27 10:52:26 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation drbd_xen1:0_notify_0 (call=15, rc=0, cib-update=20,
> confirmed=true) ok
> Jun 27 10:52:26 xen1 attrd: [3950]: info: find_hash_entry: Creating
> hash entry for fail-count-vm1xen2
> Jun 27 10:52:26 xen1 attrd: [3950]: info: find_hash_entry: Creating
> hash entry for last-failure-vm1xen2
> Jun 27 10:52:26 xen1 Xen[4363]: INFO: Xen domain vm1 already stopped.
> Jun 27 10:52:26 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation vm1xen1_stop_0 (call=14, rc=0, cib-update=21,
> confirmed=true) ok
> Jun 27 10:52:56 xen1 lrmd: [3949]: info: cancel_op: operation
> monitor[12] on ocf::Filesystem::xen1_fs for client 3952, its
> parameters: CRM_meta_interval=[10000] directory=[/xen1] fstype=[ext3]
> CRM_meta_start_delay=[30000] device=[/dev/drbd1]
> CRM_meta_timeout=[50000] crm_feature_set=[3.0.1]
> CRM_meta_name=[monitor]  cancelled
> Jun 27 10:52:56 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=73:4:0:e8d1e361-2956-49c9-9c49-9babc795edc6 op=xen1_fs_stop_0 )
> Jun 27 10:52:56 xen1 lrmd: [3949]: info: rsc:xen1_fs:16: stop
> Jun 27 10:52:56 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=95:4:0:e8d1e361-2956-49c9-9c49-9babc795edc6
> op=drbd_xen1:0_notify_0 )
> Jun 27 10:52:56 xen1 lrmd: [3949]: info: rsc:drbd_xen1:0:17: notify
> Jun 27 10:52:56 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation xen1_fs_monitor_10000 (call=12, status=1, cib-update=0,
> confirmed=true) Cancelled
> Jun 27 10:52:56 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen1:0:notify:stderr) DRBD module version: 8.3.8    userland
> version: 8.3.6 you should upgrade your drbd tools!
> Jun 27 10:52:56 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation drbd_xen1:0_notify_0 (call=17, rc=0, cib-update=22,
> confirmed=true) ok
> Jun 27 10:52:56 xen1 Filesystem[4537]: INFO: Running stop for
> /dev/drbd1 on /xen1
> Jun 27 10:52:56 xen1 Filesystem[4537]: INFO: Trying to unmount /xen1
> Jun 27 10:52:56 xen1 Filesystem[4537]: INFO: unmounted /xen1 successfully
> Jun 27 10:52:56 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation xen1_fs_stop_0 (call=16, rc=0, cib-update=23,
> confirmed=true) ok
> Jun 27 10:52:56 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=10:4:0:e8d1e361-2956-49c9-9c49-9babc795edc6
> op=drbd_xen1:0_demote_0 )
> Jun 27 10:52:56 xen1 lrmd: [3949]: info: rsc:drbd_xen1:0:18: demote
> Jun 27 10:52:56 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen1:0:demote:stderr) DRBD module version: 8.3.8    userland
> version: 8.3.6 you should upgrade your drbd tools!
> Jun 27 10:52:56 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen1:0:demote:stderr) DRBD module version: 8.3.8    userland
> version: 8.3.6 you should upgrade your drbd tools!
> Jun 27 10:52:56 xen1 kernel: block drbd1: role( Primary -> Secondary )
> Jun 27 10:52:56 xen1 lrmd: [3949]: info: RA output: 
> (drbd_xen1:0:demote:stdout)
> Jun 27 10:52:56 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation drbd_xen1:0_demote_0 (call=18, rc=0, cib-update=24,
> confirmed=true) ok
> Jun 27 10:52:56 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=96:4:0:e8d1e361-2956-49c9-9c49-9babc795edc6
> op=drbd_xen1:0_notify_0 )
> Jun 27 10:52:56 xen1 lrmd: [3949]: info: rsc:drbd_xen1:0:19: notify
> Jun 27 10:52:56 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen1:0:notify:stderr) DRBD module version: 8.3.8    userland
> version: 8.3.6 you should upgrade your drbd tools!
> Jun 27 10:52:56 xen1 lrmd: [3949]: info: RA output: 
> (drbd_xen1:0:notify:stdout)
> Jun 27 10:52:56 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation drbd_xen1:0_notify_0 (call=19, rc=0, cib-update=25,
> confirmed=true) ok
> Jun 27 10:52:56 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=88:4:0:e8d1e361-2956-49c9-9c49-9babc795edc6
> op=drbd_xen1:0_notify_0 )
> Jun 27 10:52:56 xen1 lrmd: [3949]: info: rsc:drbd_xen1:0:20: notify
> Jun 27 10:52:56 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen1:0:notify:stderr) DRBD module version: 8.3.8    userland
> version: 8.3.6 you should upgrade your drbd tools!
> Jun 27 10:52:56 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation drbd_xen1:0_notify_0 (call=20, rc=0, cib-update=26,
> confirmed=true) ok
> Jun 27 10:52:56 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=11:4:0:e8d1e361-2956-49c9-9c49-9babc795edc6 op=drbd_xen1:0_stop_0
> )
> Jun 27 10:52:56 xen1 lrmd: [3949]: info: rsc:drbd_xen1:0:21: stop
> Jun 27 10:52:56 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen1:0:stop:stderr) DRBD module version: 8.3.8    userland
> version: 8.3.6 you should upgrade your drbd tools!
> Jun 27 10:52:56 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen1:0:stop:stderr) DRBD module version: 8.3.8    userland
> version: 8.3.6 you should upgrade your drbd tools!
> Jun 27 10:52:56 xen1 kernel: block drbd1: peer( Secondary -> Unknown )
> conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )
> Jun 27 10:52:56 xen1 kernel: block drbd1: short read expecting header
> on sock: r=-512
> Jun 27 10:52:56 xen1 kernel: block drbd1: asender terminated
> Jun 27 10:52:56 xen1 kernel: block drbd1: Terminating asender thread
> Jun 27 10:52:56 xen1 kernel: block drbd1: Connection closed
> Jun 27 10:52:56 xen1 kernel: block drbd1: conn( Disconnecting -> StandAlone )
> Jun 27 10:52:56 xen1 kernel: block drbd1: receiver terminated
> Jun 27 10:52:56 xen1 kernel: block drbd1: Terminating receiver thread
> Jun 27 10:52:56 xen1 kernel: block drbd1: disk( UpToDate -> Diskless )
> Jun 27 10:52:56 xen1 kernel: block drbd1: drbd_bm_resize called with
> capacity == 0
> Jun 27 10:52:56 xen1 kernel: block drbd1: worker terminated
> Jun 27 10:52:56 xen1 kernel: block drbd1: Terminating worker thread
> Jun 27 10:52:56 xen1 lrmd: [3949]: info: RA output: (drbd_xen1:0:stop:stdout)
> Jun 27 10:52:56 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen1:0:stop:stderr) DRBD module version: 8.3.8    userland
> version: 8.3.6 you should upgrade your drbd tools!
> Jun 27 10:52:56 xen1 kernel: block drbd1: State change failed: Disk
> state is lower than outdated
> Jun 27 10:52:56 xen1 kernel: block drbd1:   state = { cs:StandAlone
> ro:Secondary/Unknown ds:Diskless/DUnknown r--- }
> Jun 27 10:52:56 xen1 kernel: block drbd1:  wanted = { cs:StandAlone
> ro:Secondary/Unknown ds:Outdated/DUnknown r--- }
> Jun 27 10:52:56 xen1 lrmd: [3949]: info: RA output: (drbd_xen1:0:stop:stdout)
> Jun 27 10:52:56 xen1 attrd: [3950]: info: attrd_trigger_update:
> Sending flush op to all hosts for: master-drbd_xen1:0 (-INFINITY)
> Jun 27 10:52:56 xen1 attrd: [3950]: info: attrd_perform_update: Sent
> update 32: master-drbd_xen1:0=-INFINITY
> Jun 27 10:52:56 xen1 lrmd: [3949]: info: RA output: (drbd_xen1:0:stop:stdout)
> Jun 27 10:52:56 xen1 crm_attribute: [4746]: info: Invoked:
> crm_attribute -N xen1.box.com -n master-drbd_xen1:0 -l reboot -D
> Jun 27 10:52:56 xen1 attrd: [3950]: info: attrd_trigger_update:
> Sending flush op to all hosts for: master-drbd_xen1:0 (<null>)
> Jun 27 10:52:56 xen1 attrd: [3950]: info: attrd_perform_update: Sent
> delete 34: node=xen1.box.com, attr=master-drbd_xen1:0, id=<n/a>,
> set=(null), section=status
> Jun 27 10:52:56 xen1 attrd: [3950]: info: attrd_perform_update: Sent
> delete 36: node=xen1.box.com, attr=master-drbd_xen1:0, id=<n/a>,
> set=(null), section=status
> Jun 27 10:52:56 xen1 lrmd: [3949]: info: RA output: (drbd_xen1:0:stop:stdout)
> Jun 27 10:52:56 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation drbd_xen1:0_stop_0 (call=21, rc=0, cib-update=27,
> confirmed=true) ok
> Jun 27 10:52:57 xen1 attrd: [3950]: info: find_hash_entry: Creating
> hash entry for fail-count-drbd_xen1:0
> Jun 27 10:52:57 xen1 attrd: [3950]: info: find_hash_entry: Creating
> hash entry for last-failure-drbd_xen1:0
> Jun 27 10:52:58 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=41:7:0:e8d1e361-2956-49c9-9c49-9babc795edc6 op=drbd_xen2:1_start_0
> )
> Jun 27 10:52:58 xen1 lrmd: [3949]: info: rsc:drbd_xen2:1:22: start
> Jun 27 10:52:58 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen2:1:start:stderr) DRBD module version: 8.3.8    userland
> version: 8.3.6 you should upgrade your drbd tools!
> Jun 27 10:52:58 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen2:1:start:stderr) 'xen2' not defined in your config.
> Jun 27 10:52:58 xen1 drbd[4747]: ERROR: DRBD resource xen2 not found
> in configuration file /etc/drbd.conf.
> Jun 27 10:52:58 xen1 crm_attribute: [4774]: info: Invoked:
> crm_attribute -N xen1.box.com -n master-drbd_xen2:1 -l reboot -D
> Jun 27 10:52:58 xen1 attrd: [3950]: info: find_hash_entry: Creating
> hash entry for master-drbd_xen2:1
> Jun 27 10:52:58 xen1 lrmd: [3949]: info: RA output: (drbd_xen2:1:start:stdout)
> Jun 27 10:52:58 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation drbd_xen2:1_start_0 (call=22, rc=5, cib-update=28,
> confirmed=true) not installed
> Jun 27 10:52:58 xen1 attrd: [3950]: info: attrd_ais_dispatch: Update
> relayed from fa1.box.com
> Jun 27 10:52:58 xen1 attrd: [3950]: info: find_hash_entry: Creating
> hash entry for fail-count-drbd_xen2:1
> Jun 27 10:52:58 xen1 attrd: [3950]: info: attrd_trigger_update:
> Sending flush op to all hosts for: fail-count-drbd_xen2:1 (INFINITY)
> Jun 27 10:52:58 xen1 attrd: [3950]: info: attrd_perform_update: Sent
> update 45: fail-count-drbd_xen2:1=INFINITY
> Jun 27 10:52:58 xen1 attrd: [3950]: info: attrd_ais_dispatch: Update
> relayed from fa1.box.com
> Jun 27 10:52:58 xen1 attrd: [3950]: info: find_hash_entry: Creating
> hash entry for last-failure-drbd_xen2:1
> Jun 27 10:52:58 xen1 attrd: [3950]: info: attrd_trigger_update:
> Sending flush op to all hosts for: last-failure-drbd_xen2:1
> (1277650378)
> Jun 27 10:52:58 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=97:7:0:e8d1e361-2956-49c9-9c49-9babc795edc6
> op=drbd_xen2:1_notify_0 )
> Jun 27 10:52:58 xen1 lrmd: [3949]: info: rsc:drbd_xen2:1:23: notify
> Jun 27 10:52:58 xen1 attrd: [3950]: info: attrd_perform_update: Sent
> update 48: last-failure-drbd_xen2:1=1277650378
> Jun 27 10:52:58 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen2:1:notify:stderr) DRBD module version: 8.3.8    userland
> version: 8.3.6 you should upgrade your drbd tools!
> Jun 27 10:52:58 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen2:1:notify:stderr) 'xen2' not defined in your config.
> Jun 27 10:52:58 xen1 drbd[4781]: ERROR: DRBD resource xen2 not found
> in configuration file /etc/drbd.conf.
> Jun 27 10:52:58 xen1 crm_attribute: [4808]: info: Invoked:
> crm_attribute -N xen1.box.com -n master-drbd_xen2:1 -l reboot -D
> Jun 27 10:52:58 xen1 lrmd: [3949]: info: RA output: 
> (drbd_xen2:1:notify:stdout)
> Jun 27 10:52:58 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation drbd_xen2:1_notify_0 (call=23, rc=0, cib-update=29,
> confirmed=true) ok
> Jun 27 10:53:00 xen1 cib: [4809]: info: write_cib_contents: Archived
> previous version as /var/lib/heartbeat/crm/cib-63.raw
> Jun 27 10:53:00 xen1 cib: [4809]: info: write_cib_contents: Wrote
> version 0.80.0 of the CIB to disk (digest:
> 4103d849359e1962585c9d84e56ce3f0)
> Jun 27 10:53:00 xen1 cib: [4809]: info: retrieveCib: Reading cluster
> configuration from: /var/lib/heartbeat/crm/cib.bteVor (digest:
> /var/lib/heartbeat/crm/cib.td8DMI)
> Jun 27 10:53:00 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=100:8:0:e8d1e361-2956-49c9-9c49-9babc795edc6
> op=drbd_xen2:1_notify_0 )
> Jun 27 10:53:00 xen1 lrmd: [3949]: info: rsc:drbd_xen2:1:24: notify
> Jun 27 10:53:00 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen2:1:notify:stderr) DRBD module version: 8.3.8    userland
> version: 8.3.6 you should upgrade your drbd tools!
> Jun 27 10:53:00 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen2:1:notify:stderr) 'xen2' not defined in your config.
> Jun 27 10:53:00 xen1 drbd[4810]: ERROR: DRBD resource xen2 not found
> in configuration file /etc/drbd.conf.
> Jun 27 10:53:00 xen1 crm_attribute: [4837]: info: Invoked:
> crm_attribute -N xen1.box.com -n master-drbd_xen2:1 -l reboot -D
> Jun 27 10:53:00 xen1 lrmd: [3949]: info: RA output: 
> (drbd_xen2:1:notify:stdout)
> Jun 27 10:53:00 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation drbd_xen2:1_notify_0 (call=24, rc=0, cib-update=30,
> confirmed=true) ok
> Jun 27 10:53:00 xen1 crmd: [3952]: info: do_lrm_rsc_op: Performing
> key=1:8:0:e8d1e361-2956-49c9-9c49-9babc795edc6 op=drbd_xen2:1_stop_0 )
> Jun 27 10:53:00 xen1 lrmd: [3949]: info: rsc:drbd_xen2:1:25: stop
> Jun 27 10:53:00 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen2:1:stop:stderr) DRBD module version: 8.3.8    userland
> version: 8.3.6 you should upgrade your drbd tools!
> Jun 27 10:53:00 xen1 lrmd: [3949]: info: RA output:
> (drbd_xen2:1:stop:stderr) 'xen2' not defined in your config.
> Jun 27 10:53:00 xen1 drbd[4838]: ERROR: DRBD resource xen2 not found
> in configuration file /etc/drbd.conf.
> Jun 27 10:53:00 xen1 crm_attribute: [4865]: info: Invoked:
> crm_attribute -N xen1.box.com -n master-drbd_xen2:1 -l reboot -D
> Jun 27 10:53:00 xen1 lrmd: [3949]: info: RA output: (drbd_xen2:1:stop:stdout)
> Jun 27 10:53:00 xen1 crmd: [3952]: info: process_lrm_event: LRM
> operation drbd_xen2:1_stop_0 (call=25, rc=5, cib-update=31,
> confirmed=true) not installed
> Jun 27 10:53:00 xen1 attrd: [3950]: info: attrd_ais_dispatch: Update
> relayed from fa1.box.com
> Jun 27 10:53:00 xen1 attrd: [3950]: info: attrd_ais_dispatch: Update
> relayed from fa1.box.com
> Jun 27 10:53:00 xen1 attrd: [3950]: info: attrd_trigger_update:
> Sending flush op to all hosts for: last-failure-drbd_xen2:1
> (1277650381)
> Jun 27 10:53:00 xen1 attrd: [3950]: info: attrd_perform_update: Sent
> update 51: last-failure-drbd_xen2:1=1277650381
> Jun 27 10:53:10 xen1 ntpd[2996]: synchronized to LOCAL(0), stratum 10
> Jun 27 10:53:10 xen1 ntpd[2996]: kernel time sync enabled 0001
>
> New config is:
>
> node fa1.box.com
> node xen1.box.com
> node xen2.box.com
> primitive drbd_xen1 ocf:linbit:drbd \
>        params drbd_resource="xen1" \
>        op monitor interval="15s"
> primitive drbd_xen2 ocf:linbit:drbd \
>        params drbd_resource="xen2" \
>        op monitor interval="15s"
> primitive vm1xen1 ocf:heartbeat:Xen \
>        params xmfile="/xen1/vm1" \
>        op monitor interval="0s" timeout="60s" start-delay="5s"
> depth="0" target-role="Stoppedop" start \
>        op stop interval="0s" timeout="120s" \
>        meta is-managed="true" \
>        meta target-role="Started"
> primitive vm1xen2 ocf:heartbeat:Xen \
>        params xmfile="/xen2/vm1" \
>        op monitor interval="0s" timeout="60s" start-delay="5s"
> depth="0" target-role="Stoppedop" start \
>        op stop interval="0s" timeout="120s" \
>        meta is-managed="true" \
>        meta target-role="Started"
> primitive xen1_fs ocf:heartbeat:Filesystem \
>        params device="/dev/drbd1" directory="/xen1" fstype="ext3" \
>        op monitor start-delay="30s" interval="10s"
> primitive xen2_fs ocf:heartbeat:Filesystem \
>        params device="/dev/drbd2" directory="/xen2" fstype="ext3" \
>        op monitor start-delay="30s" interval="10s"
> ms ms_drbd_xen1 drbd_xen1 \
>        meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> ms ms_drbd_xen2 drbd_xen2 \
>        meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> location drbd-fence-by-handler-ms_drbd_xen1 ms_drbd_xen1 \
>        rule $id="drbd-fence-by-handler-rule-ms_drbd_xen1"
> $role="Master" -inf: #uname ne fa1.box.com
> location drbd-fence-by-handler-ms_drbd_xen2 ms_drbd_xen2 \
>        rule $id="drbd-fence-by-handler-rule-ms_drbd_xen2"
> $role="Master" -inf: #uname ne xen2.box.com
> colocation fs_on_drbd1 inf: xen1_fs ms_drbd_xen1:Master
> colocation fs_on_drbd2 inf: xen2_fs ms_drbd_xen2:Master
> colocation vm1xen1-with-xen1_fs inf: vm1xen1 xen1_fs
> colocation vm1xen2-with-xen2_fs inf: vm1xen2 xen2_fs
> order fs_after_drbd1 inf: ms_drbd_xen1:promote xen1_fs:start
> order fs_after_drbd2 inf: ms_drbd_xen2:promote xen2_fs:start
> order vm1xen1-after-xen1_fs inf: xen1_fs:start vm1xen1:start
> order vm1xen2-after-xen2_fs inf: xen2_fs:start vm1xen2:start
> property $id="cib-bootstrap-options" \
>        dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \
>        cluster-infrastructure="openais" \
>        expected-quorum-votes="3" \
>        no-quorum-policy="ignore" \
>        stonith-enabled="false" \
>        default-resource-stickiness="1000" \
>        last-lrm-refresh="1277632456"
>
>
> Although before it used to it very quickly , but now it kinda acts
> differently, just I don't like it that when it mounts the partition as
> you can see from the logs it starts the VPS very very quickly, right
> after the mount, that is why it can't see the config file, if
> pacemaker could've given it 3-4 seconds it might have worked.
>
> Kinda confused.
>
> Thanks
>
> Joe
>
>
> On Sun, Jun 27, 2010 at 7:34 AM, Greg Woods <[email protected]> wrote:
>> On Sun, 2010-06-27 at 03:02 -0700, Joe Shang wrote:
>>
>>> Failed actions:
>>>     drbd_xen2:1_start_0 (node=xen1.box.com, call=10, rc=5,
>>> status=complete): not installed
>>
>> This is one of the things that I don't like about heartbeat/pacemaker. A
>> minor error (misconfiguring a single resource) can cause major problems
>> (like a stonith death match that brings down the entire cluster).
>>
>> One thing I have seen with Xen VMs is that the default timeouts are too
>> short. That may not be your particular problem, but you probably need to
>> increase them anyway. This is an example of what I have:
>>
>>
>>
>> primitive VM-ldap ocf:heartbeat:Xen \
>>        params xmfile="/etc/xen/ldap" \
>>        op monitor interval="10" timeout="120" depth="0"
>> target-role="Stopped" \
>>        op start interval="0" timeout="60s" \
>>        op stop interval="0" timeout="120s" \
>>        meta is-managed="true" target-role="Started"
>>
>> Before I added the explicit "op start" and "op stop" timeouts, I woulod
>> get failed stop or start operations and any attempt to fail over would
>> start a death match.
>>
>> --Greg
>>
>>
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] 3 node cluster keeps failing after domU image is started

Reply via email to