Sorry, I've had to ignore Heartbeat based clusters for the last few weeks...
There may have been a problem with 1.0.2, I never tested it with Heartbeat, but my testing this week indicates the current code should work. So you might want to consider updating... This looks suspicious though: heartbeat[1831]: 2009/03/18_14:18:03 WARN: Message hist queue is filling up (377 messages in queue) and would seem to indicate some sort of communications problem. I'd suggest grabbing the latest Pacemaker code and submitting a bug if you find it happens again. Andrew On Wed, Mar 18, 2009 at 18:29, Juha Heinanen <[email protected]> wrote: > i set up the example apache cluster of document > > http://www.clusterlabs.org/wiki/DRBD_HowTo_1.0 > > but used mysql server instead of apache server. crm of my test cluster > looks like this: > > node $id="8df8447f-6ecf-41a7-a131-c89fd59a120d" lenny1 > node $id="f13aff7b-6c94-43ac-9a24-b118e62d5325" lenny2 > primitive drbd0 ocf:heartbeat:drbd \ > params drbd_resource="drbd0" \ > op monitor interval="59s" role="Master" timeout="30s" \ > op monitor interval="60s" role="Slave" timeout="30s" > primitive fs0 ocf:heartbeat:Filesystem \ > params ftype="ext3" directory="/var/lib/mysql" device="/dev/drbd0" \ > meta target-role="Started" > primitive mysql-server lsb:mysql \ > op monitor interval="10s" timeout="30s" start-delay="10s" > primitive virtual-ip ocf:heartbeat:IPaddr2 \ > params ip="192.98.102.10" broadcast="192.98.102.255" nic="eth1" > cidr_netmask="24" \ > op monitor interval="21s" timeout="5s" > group mysql-group fs0 mysql-server virtual-ip > ms ms-drbd0 drbd0 \ > meta clone-max="2" notify="true" globally-unique="false" > target-role="Started" > colocation mysql-group-on-ms-drbd0 inf: mysql-group ms-drbd0:Master > order ms-drbd0-before-mysql-group inf: ms-drbd0:promote mysql-group:start > property $id="cib-bootstrap-options" \ > dc-version="1.0.2-ec6b0bbee1f3aa72c4c2559997e675db6ab39160" \ > default-resource-stickiness="1" > > initially both nodes were online, lenny2 being the master. then i tried > what happens when i reboot lenny1. when lenny1 was powered off, cluster > looked correctly like this: > > # crm_mon -1 > > ============ > Last updated: Wed Mar 18 14:12:09 2009 > Current DC: lenny2 (f13aff7b-6c94-43ac-9a24-b118e62d5325) > Version: 1.0.2-ec6b0bbee1f3aa72c4c2559997e675db6ab39160 > 2 Nodes configured. > 2 Resources configured. > ============ > > Node: lenny1 (8df8447f-6ecf-41a7-a131-c89fd59a120d): OFFLINE > Node: lenny2 (f13aff7b-6c94-43ac-9a24-b118e62d5325): online > > Master/Slave Set: ms-drbd0 > drbd0:0 (ocf::heartbeat:drbd): Stopped > drbd0:1 (ocf::heartbeat:drbd): Master lenny2 > Resource Group: mysql-group > fs0 (ocf::heartbeat:Filesystem): Started lenny2 > mysql-server (lsb:mysql): Started lenny2 > virtual-ip (ocf::heartbeat:IPaddr2): Started lenny2 > > when i powered lenny1 on again, i expected that after is becomes online > again, but it was totally ignored. > > the log is below. versions of software are heartbeat 2.99.2 and > pacemaker 1.0.2. > > any glues why lenny1 was ignored and my very first test to achieve high > availability with heartbeat/pacemaker failed? people on pacemaker list > suspected ccm, which is part of heartbeat. > > -- juha > > ------------------------------------------ > > this came to syslog when lenny1 was powered off: > > r...@lenny2:~# heartbeat[1831]: 2009/03/18_14:12:32 WARN: node lenny1: is dead > heartbeat[1831]: 2009/03/18_14:12:32 info: Link lenny1:eth1 dead. > crmd[1923]: 2009/03/18_14:12:32 notice: crmd_ha_status_callback: Status > update: > Node lenny1 now has status [dead] (DC=true) > crmd[1923]: 2009/03/18_14:12:32 info: crm_update_peer_proc: lenny1.ais is now > offline > crmd[1923]: 2009/03/18_14:12:32 info: te_graph_trigger: Transition 12 is now > complete > crmd[1923]: 2009/03/18_14:12:32 info: notify_crmd: Transition 12 status: done > - > <null> > > and this when it was powered on again: > > heartbeat[1831]: 2009/03/18_14:12:56 info: Heartbeat restart on node lenny1 > heartbeat[1831]: 2009/03/18_14:12:56 info: Link lenny1:eth1 up. > heartbeat[1831]: 2009/03/18_14:12:56 info: Status update for node lenny1: > status init > heartbeat[1831]: 2009/03/18_14:12:56 info: Status update for node lenny1: > status up > crmd[1923]: 2009/03/18_14:12:56 notice: crmd_ha_status_callback: Status > update: > Node lenny1 now has status [init] (DC=true) > crmd[1923]: 2009/03/18_14:12:56 info: crm_update_peer_proc: lenny1.ais is now > online > crmd[1923]: 2009/03/18_14:12:56 notice: crmd_ha_status_callback: Status > update: > Node lenny1 now has status [up] (DC=true) > heartbeat[1831]: 2009/03/18_14:13:26 info: Status update for node lenny1: > status active > crmd[1923]: 2009/03/18_14:13:26 notice: crmd_ha_status_callback: Status > update: > Node lenny1 now has status [active] (DC=true) > cib[1919]: 2009/03/18_14:13:26 info: cib_client_status_callback: Status > update: > Client lenny1/cib now has status [join] > cib[1919]: 2009/03/18_14:13:26 info: crm_update_peer_proc: lenny1.cib is now > online > heartbeat[1831]: 2009/03/18_14:13:30 WARN: 1 lost packet(s) for [lenny1] > [55:57] > heartbeat[1831]: 2009/03/18_14:13:30 info: No pkts missing from lenny1! > crmd[1923]: 2009/03/18_14:13:30 notice: crmd_client_status_callback: Status > update: Client lenny1/crmd now has status [online] (DC=true) > crmd[1923]: 2009/03/18_14:13:30 info: crm_update_peer_proc: lenny1.crmd is now > online > heartbeat[1831]: 2009/03/18_14:13:31 WARN: 1 lost packet(s) for [lenny1] > [59:61] > heartbeat[1831]: 2009/03/18_14:13:31 info: No pkts missing from lenny1! > crmd[1923]: 2009/03/18_14:13:33 WARN: crmd_ha_msg_callback: Ignoring HA > message > (op=join_announce) from lenny1: not in our membership list (size=1) > crmd[1923]: 2009/03/18_14:13:43 WARN: crmd_ha_msg_callback: Ignoring HA > message > (op=vote) from lenny1: not in our membership list (size=1) > cib[1919]: 2009/03/18_14:13:46 WARN: cib_peer_callback: Discarding > cib_slave_all message (50) from lenny1: not in our membership > cib[1919]: 2009/03/18_14:13:47 WARN: cib_peer_callback: Discarding cib_replace > message (54) from lenny1: not in our membership > cib[1919]: 2009/03/18_14:13:48 WARN: cib_peer_callback: Discarding > cib_apply_diff message (58) from lenny1: not in our membership > cib[1919]: 2009/03/18_14:13:50 WARN: cib_peer_callback: Discarding > cib_apply_diff message (5c) from lenny1: not in our membership > cib[1919]: 2009/03/18_14:13:51 WARN: cib_peer_callback: Discarding > cib_apply_diff message (5e) from lenny1: not in our membership > cib[1919]: 2009/03/18_14:13:51 WARN: cib_peer_callback: Discarding > cib_apply_diff message (5f) from lenny1: not in our membership > cib[1919]: 2009/03/18_14:13:51 WARN: cib_peer_callback: Discarding > cib_apply_diff message (60) from lenny1: not in our membership > cib[1919]: 2009/03/18_14:13:51 WARN: cib_peer_callback: Discarding > cib_apply_diff message (61) from lenny1: not in our membership > cib[1919]: 2009/03/18_14:13:51 WARN: cib_peer_callback: Discarding > cib_apply_diff message (62) from lenny1: not in our membership > heartbeat[1831]: 2009/03/18_14:16:01 info: all clients are now paused > cib[1919]: 2009/03/18_14:16:27 info: cib_stats: Processed 32 operations > (19062.00us average, 0% utilization) in the last 10min > heartbeat[1831]: 2009/03/18_14:18:02 WARN: Message hist queue is filling up > (376 messages in queue) > heartbeat[1831]: 2009/03/18_14:18:03 WARN: Message hist queue is filling up > (377 messages in queue) > ... > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
