Apologies re second post on this issue, this was originally blocked due to attachment and has just been released!
Jimmy. On 4 Jul 2013, at 00:49, Jimmy Magee <[email protected]> wrote: > Hi all, > > I'm endeavouring to setup a drbb pacemaker cluster on two Centos 6.4 kvm's. > The kvm are running on a Centos 6.4 host operating, vms installed on separate > logical volumes, with 1 GB ram allocated to each vm. > Drbb starts manually and promoting/demoting the device via drbdadm works fine. > Pacemaker seems to starts all resources without error on startup however the > drbd monitor fails , causing a pid to timeout and restarts all resources on > the primary node again.. > Observing drbd via service drbd status or /proc/drbd when both nodes are > online is ok..however fails to promote drbd on active node when primary is > put in standby mode.. > I've outlined cluster and drbd config setting below and attache the pacemaker > logs which may help explain the situation and behaviour clearly. > Not sure what is causing the issue, appreciate assistance with this issue, > and please let me know if you require more info.. > > Cheers, > Jimmy. > > > > # cat /proc/drbd > version: 8.4.2 (api:1/proto:86-101) > GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by dag@Build64R6, > 2012-09-06 08:16:10 > 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- > ns:376 nr:0 dw:376 dr:6961 al:7 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 > > Primary node is put in standby mode > > > # cat /proc/drbd (webtext-2) > version: 8.4.2 (api:1/proto:86-101) > GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by dag@Build64R6, > 2012-09-06 08:16:10 > 0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/Outdated C r----- > ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 > > # cat /proc/drbd (webtext-1) > version: 8.4.2 (api:1/proto:86-101) > GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by dag@Build64R6, > 2012-09-06 08:16:10 > > > > > > Jul 3 21:47:19 webtext-2 lrmd[2511]: warning: child_timeout_callback: > mysql_drbd_monitor_0 process (PID 18391) timed out > Jul 3 21:47:21 webtext-2 crmd[2514]: error: process_lrm_event: LRM > operation mysql_drbd_monitor_0 (666) Timed Out (timeout=20000ms) > Jul 3 22:51:46 webtext-2 lrmd[19204]: warning: child_timeout_callback: > mysql_drbd_promote_0 process (PID 21046) timed out > Jul 3 22:51:47 webtext-2 crmd[19207]: error: process_lrm_event: LRM > operation mysql_drbd_promote_0 (189) Timed Out (timeout=20000ms) > > > crmd extract from logs.. > > ul 03 21:18:32 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 51: notify mysql_drbd_pre_notify_stop_0 > on webtext-1 > Jul 03 21:18:40 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 2: stop mysql_drbd_stop_0 on webtext-2 > (local) > Jul 03 21:18:46 [2514] webtext-2.vennetics.com crmd: info: > process_graph_event: Detected action (74.2) mysql_drbd_stop_0.650=not > installed: failed > Jul 03 21:18:46 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 52: notify mysql_drbd_post_notify_stop_0 > on webtext-1 > Jul 03 21:22:42 [2514] webtext-2.vennetics.com crmd: info: > process_graph_event: Detected action (61.12) > mysql_drbd_monitor_130000.354=not running: failed > Jul 03 21:22:42 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 55: notify mysql_drbd_pre_notify_demote_0 > on webtext-1 > Jul 03 21:22:50 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 55: notify mysql_drbd_pre_notify_demote_0 > on webtext-1 > Jul 03 21:22:55 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 9: demote mysql_drbd_demote_0 on webtext-1 > Jul 03 21:23:00 [2514] webtext-2.vennetics.com crmd: info: > process_graph_event: Detected action (77.9) > mysql_drbd_demote_0.371=unknown error: failed > Jul 03 21:23:00 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 56: notify mysql_drbd_post_notify_demote_0 > on webtext-1 > Jul 03 21:23:05 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 48: notify mysql_drbd_pre_notify_stop_0 on > webtext-1 > Jul 03 21:23:07 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 2: stop mysql_drbd_stop_0 on webtext-1 > Jul 03 21:23:11 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 7: start mysql_drbd_start_0 on webtext-1 > Jul 03 21:23:19 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 46: notify mysql_drbd_post_notify_start_0 > on webtext-1 > Jul 03 21:23:23 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 53: notify mysql_drbd_pre_notify_promote_0 > on webtext-1 > Jul 03 21:23:25 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 9: promote mysql_drbd_promote_0 on > webtext-1 > Jul 03 21:23:30 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 54: notify > mysql_drbd_post_notify_promote_0 on webtext-1 > Jul 03 21:23:36 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 10: monitor mysql_drbd_monitor_130000 on > webtext-1 > Jul 03 21:27:40 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 52: notify mysql_drbd_pre_notify_demote_0 > on webtext-1 > Jul 03 21:27:43 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 8: demote mysql_drbd_demote_0 on webtext-1 > Jul 03 21:27:50 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 53: notify mysql_drbd_post_notify_demote_0 > on webtext-1 > Jul 03 21:28:04 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 49: notify mysql_drbd_pre_notify_stop_0 on > webtext-1 > Jul 03 21:28:21 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 48: notify mysql_drbd_pre_notify_stop_0 on > webtext-1 > Jul 03 21:28:40 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 7: stop mysql_drbd_stop_0 on webtext-1 > Jul 03 21:45:32 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 4: monitor mysql_fs_monitor_0 on webtext-2 > (local) > Jul 03 21:45:42 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 3: probe_complete probe_complete on > webtext-2 (local) - no waiting > Jul 03 21:46:58 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 4: monitor mysql_drbd:0_monitor_0 on > webtext-2 (local) > Jul 03 21:47:23 [2514] webtext-2.vennetics.com crmd: info: > process_graph_event: Detected action (88.4) > mysql_drbd_monitor_0.666=unknown error: failed > Jul 03 21:47:23 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 3: probe_complete probe_complete on > webtext-2 (local) - no waiting > Jul 03 21:47:29 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 42: notify mysql_drbd_pre_notify_stop_0 on > webtext-2 (local) > Jul 03 21:47:44 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 43: notify mysql_drbd_pre_notify_stop_0 on > webtext-2 (local) > Jul 03 21:47:47 [2514] webtext-2.vennetics.com crmd: notice: > te_rsc_command: Initiating action 1: stop mysql_drbd_stop_0 on webtext-2 > (local) > Jul 03 22:06:16 [19207] webtext-2.vennetics.com crmd: info: > services_os_action_execute: Managed Filesystem_meta-data_0 process 19326 > exited with rc=0 > Jul 03 22:06:18 [19207] webtext-2.vennetics.com crmd: info: > services_os_action_execute: Managed drbd_meta-data_0 process 19332 exited > with rc=0 > Jul 03 22:06:54 [19207] webtext-2.vennetics.com crmd: info: > services_os_action_execute: Managed drbd_meta-data_0 process 19396 exited > with rc=0 > Jul 03 22:52:22 [19207] webtext-2.vennetics.com crmd: info: > services_os_action_execute: Managed drbd_meta-data_0 process 21326 exited > with rc=0 > > > > > > # crm_mon --inactive --group-by-node -1 > Last updated: Wed Jul 3 23:17:32 2013 > Last change: Wed Jul 3 22:51:38 2013 via cibadmin on webtext-2 > Stack: cman > Current DC: webtext-1 - partition with quorum > Version: 1.1.10-1.el6-2718638 > 2 Nodes configured, unknown expected votes > 5 Resources configured. > > > Node webtext-1: standby > Node webtext-2: online > mysql_drbd (ocf::linbit:drbd): Started > > Inactive resources: > > Master/Slave Set: mysql_ms [mysql_drbd] > Slaves: [ webtext-2 ] > Stopped: [ webtext-1 ] > Resource Group: mysql > mysql_fs (ocf::heartbeat:Filesystem): Stopped > mysql_init (lsb:mysql): Stopped > jboss_init (lsb:jboss): Stopped > > Failed actions: > mysql_drbd_monitor_130000 (node=webtext-1, call=349, rc=1, status=Timed > Out, last-rc-change=Wed Jul 3 22:44:49 2013 > , queued=0ms, exec=0ms > ): unknown error > mysql_drbd_promote_0 (node=webtext-2, call=189, rc=1, status=Timed Out, > last-rc-change=Wed Jul 3 22:51:25 2013 > , queued=20980ms, exec=13ms > ): unknown error > > > > # crm configure show > node webtext-1 \ > attributes standby="on" > node webtext-2 \ > attributes standby="off" > primitive jboss_init lsb:jboss \ > op monitor interval="40" timeout="120" start-delay="320" \ > op start interval="0" timeout="320" \ > op stop interval="0" timeout="320" \ > meta target-role="Started" > primitive mysql_drbd ocf:linbit:drbd \ > params drbd_resource="r0" \ > op monitor interval="130" role="Master" \ > op monitor interval="140" role="Slave" \ > op stop interval="0" timeout="240" \ > op start interval="0" timeout="320" \ > meta target-role="Started" > primitive mysql_fs ocf:heartbeat:Filesystem \ > params device="/dev/drbd/by-res/r0" directory="/drbd0/" fstype="ext4" \ > op stop interval="0" timeout="120" \ > op start interval="0" timeout="120" \ > op monitor interval="40" timeout="120" \ > meta is-managed="true" > primitive mysql_init lsb:mysql \ > op stop interval="0" timeout="320" \ > op start interval="0" timeout="320" \ > meta is-managed="true" > group mysql mysql_fs mysql_init \ > meta target-role="Started" > ms mysql_ms mysql_drbd \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" is-managed="true" > location drbd-fence-by-handler-r0-mysql_ms mysql_ms \ > rule $id="drbd-fence-by-handler-r0-rule-mysql_ms" $role="Master" -inf: > #uname ne webtext-2.vennetics.com > colocation jboss_with_mysql inf: jboss_init mysql > colocation mysql_on_drbd inf: mysql mysql_ms:Master > order jboss_after_mysql inf: mysql_init jboss_init > order mysql_after_drbd inf: mysql_ms:promote mysql:start > property $id="cib-bootstrap-options" \ > dc-version="1.1.10-1.el6-2718638" \ > cluster-infrastructure="cman" \ > stonith-enabled="false" \ > last-lrm-refresh="1372884412" \ > no-quorum-policy="ignore" > > > #vi /etc/drbd.conf > > > global { > usage-count yes; > } > > resource r0 { > > # write IO is reported as completed if it has reached both local > # and remote disk > protocol C; > > net { > # set up peer authentication > cram-hmac-alg sha1; > shared-secret "test"; > } > > startup { > # wait for connection timeout - boot process blocked > # until DRBD resources are connected > # ----- wfc-timeout 30; > # WFC timeout if peer was outdated > # ----- outdated-wfc-timeout 20; > # WFC timeout if this node was in a degraded cluster (i.e. only had one > # node left) > # ----- degr-wfc-timeout 30; > } > > disk { > fencing resource-only; > } > > handlers { > fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; > after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; > } > > > # first node > on webtext-1.vennetics.com { > # DRBD device > device /dev/drbd0; > # backing store device > disk /dev/vg_webtext1_02/lv_drbd0; > # IP address of node, and port to listen on > address 10.87.79.218:7788; > # use internal meta data (don't create a filesystem before > # you create metadata!) > meta-disk internal; > } > # second node > on webtext-2.vennetics.com { > # DRBD debice > device /dev/drbd0; > # backing store device > disk /dev/vg_webtext2_02/lv_drbd0; > # IP address of node, and port to listen on > address 10.87.79.219:7788; > # use internal meta data (don't create a filesystem before > # you create metadata!) > meta-disk internal; > } > } > > > vi /etc/cluster/cluster.conf > > <?xml version="1.0"?> > <cluster config_version="1" name="webtext_cluster"> > <clusternodes> > <clusternode name="webtext-1" nodeid="1"> > <fence> > <method name="pcmk-redirect"> > <device name="pcmk" port="webtext-1"/> > </method> > </fence> > </clusternode> > <clusternode name="webtext-2" nodeid="2"> > <fence> > <method name="pcmk-redirect"> > <device name="pcmk" port="webtext-2"/> > </method> > </fence> > </clusternode> > </clusternodes> > <fencedevices> > <fencedevice agent="fence_pcmk" name="pcmk"/> > </fencedevices> > <cman expected_votes="1" two_node="1"/> > <logging to_syslog="yes" to_logfile="yes" syslog_facility="daemon" > syslog_priority="info" logfile_priority="info"> > <logging_daemon name="qdiskd" > logfile="/var/log/cluster/qdiskd.log" logfile_priority="debug"/> > <logging_daemon name="fenced" > logfile="/var/log/cluster/fenced.log" logfile_priority="debug"/> > <logging_daemon name="dlm_controld" > logfile="/var/log/cluster/dlm_controld.log" > logfile_priority="debug"/> > <logging_daemon name="gfs_controld" > logfile="/var/log/cluster/gfs_controld.log" > logfile_priority="debug"/> > <logging_daemon name="corosync" > logfile="/var/log/cluster/corosync.log" logfile_priority="debug"/> > </logging> > </cluster> > > > # vi /etc/sysconfig/pacemaker > > # For non-systemd based systems, prefix export to each enabled line > > # Turn on special handling for CMAN clusters in the init script > # Without this, fenced (and by inference, cman) cannot reliably be made to > shut down > PCMK_STACK=cman > > #==#==# Variables that control logging > > # Enable debug logging globally or per-subsystem > # Multiple subsystems may me listed separated by commas > PCMK_debug=crmd,pengine,cib,stonith-ng,attrd,pacemakerd > > > # rpm -qa | grep pacemaker > pacemaker-cli-1.1.10-1.el6.x86_64 > pacemaker-libs-1.1.10-1.el6.x86_64 > pacemaker-cluster-libs-1.1.10-1.el6.x86_64 > pacemaker-libs-devel-1.1.10-1.el6.x86_64 > pacemaker-remote-1.1.10-1.el6.x86_64 > pacemaker-cts-1.1.10-1.el6.x86_64 > pacemaker-debuginfo-1.1.10-1.el6.x86_64 > pacemaker-1.1.10-1.el6.x86_64 > > # rpm -qa | grep cman > cman-3.0.12.1-49.el6.x86_64 > > # rpm -qa | grep coro > corosync-1.4.1-15.el6_4.1.x86_64 > corosynclib-1.4.1-15.el6_4.1.x86_64 > corosynclib-devel-1.4.1-15.el6_4.1.x86_64 > > # rpm -qa | grep resource-agents > resource-agents-3.9.2-21.el6.x86_64 > > > # rpm -qa | grep libqb > libqb-devel-0.14.2-3.el6.x86_64 > libqb-0.14.2-3.el6.x86_64 > > # rpm -qa | grep drbd > drbd84-utils-8.4.2-1.el6.elrepo.x86_64 > kmod-drbd84-8.4.2-1.el6_3.elrepo.x86_64 > > > # ifconfig (webtext-2) > eth0 Link encap:Ethernet HWaddr 52:54:00:65:EC:27 > inet addr:10.87.79.217 Bcast:10.87.79.255 Mask:255.255.255.0 > inet6 addr: fe80::5054:ff:fe65:ec27/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:116526 errors:0 dropped:0 overruns:0 frame:0 > TX packets:104213 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:19340444 (18.4 MiB) TX bytes:53027494 (50.5 MiB) > Interrupt:10 Base address:0xc000 > > eth1 Link encap:Ethernet HWaddr 52:54:00:95:68:C1 > inet addr:10.87.79.219 Bcast:10.87.79.255 Mask:255.255.255.0 > inet6 addr: fe80::5054:ff:fe95:68c1/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:436 errors:0 dropped:0 overruns:0 frame:0 > TX packets:14 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:25724 (25.1 KiB) TX bytes:900 (900.0 b) > Interrupt:10 Base address:0xe000 > > # ifconfig (webtext-1) > eth0 Link encap:Ethernet HWaddr 52:54:00:CB:9A:F4 > inet addr:10.87.79.216 Bcast:10.87.79.255 Mask:255.255.255.0 > inet6 addr: fe80::5054:ff:fecb:9af4/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:121593 errors:0 dropped:0 overruns:0 frame:0 > TX packets:107920 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:54007733 (51.5 MiB) TX bytes:22464750 (21.4 MiB) > Interrupt:10 Base address:0xc000 > > eth1 Link encap:Ethernet HWaddr 52:54:00:30:07:C9 > inet addr:10.87.79.218 Bcast:10.87.79.255 Mask:255.255.255.0 > inet6 addr: fe80::5054:ff:fe30:7c9/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:510 errors:0 dropped:0 overruns:0 frame:0 > TX packets:12 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:29934 (29.2 KiB) TX bytes:720 (720.0 b) > Interrupt:10 Base address:0xe000 > > > # lvdisplay > --- Logical volume --- > LV Path /dev/vg_webtext2_02/lv_drbd0 > LV Name lv_drbd0 > VG Name vg_webtext2_02 > LV UUID d8ATq9-XPqT-mTAZ-By3H-dEoL-SoDV-ebCJL3 > LV Write Access read/write > LV Creation host, time webtext-2.vennetics.com, 2013-06-30 10:35:10 +0100 > LV Status available > # open 2 > LV Size 4.00 GiB > Current LE 1023 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:2 > > # lvdisplay > --- Logical volume --- > LV Path /dev/vg_webtext1_02/lv_drbd0 > LV Name lv_drbd0 > VG Name vg_webtext1_02 > LV UUID 3qB7lS-zH0O-WIKC-F6nl-0cuE-2Zu9-95RkF9 > LV Write Access read/write > LV Creation host, time webtext-1.vennetics.com, 2013-06-30 12:00:59 +0100 > LV Status available > # open 0 > LV Size 4.00 GiB > Current LE 1023 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:2 > > > > > > > > > > > > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
