Hello, I've set up a DRBD and Heartbeat configuration communicating over an Internet connection, rather than internal. The servers are running CentOS 5.4, with DRBD 8.3.2 and Heartbeat 3.0.3, out of the CentOS repository.
I start seeing these in the ha-log. ERROR: Message hist queue is filling up (500 messages in queue) Then I see a bunch of these: WARN: Gmain_timeout_dispatch: Dispatch function for retransmit request took too long to execute: 20 ms (> 10 ms) (GSource: 0x1c3025c0) And finally: May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Killing pid 5533 with SIGTERM May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Killing pid 5537 with SIGTERM May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Killing pid 5538 with SIGTERM May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Killing pid 5539 with SIGTERM May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Killing pid 5540 with SIGTERM May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Emergency Shutdown(MCP dead): Killing ourselves. At this point, Heartbeat on MySQL1 is dead, but because it died, it didn't let the resources go, and DRBD is still mounted on the first server, meaning the backup can't take over. DRBD has continued running, and the latency between servers is very low (9ms). Here's my ha.cf: logfacility local0 debug 1 debugfile /var/log/ha-debug logfile /var/log/ha-log node mysql1 node mysql2 keepalive 2 deadtime 60 initdead 120 warntime 15 udpport 694 ucast eth1 66.165.231.34 ucast eth1 67.218.128.19 auto_failback on crm yes Here's my CRM config: node $id="23b44f0c-55fb-4b21-bf2e-81c15f28816d" mysql2 node $id="96c549d6-3e8c-4f7a-a644-cdc08dd99e41" mysql1 primitive drbd heartbeat:drbddisk \ params 1="mysql" \ op monitor interval="30s" timeout="30s" primitive fs ocf:heartbeat:Filesystem \ params fstype="ext3" directory="/mnt/mysql" device="/dev/drbd1" \ op monitor interval="30s" timeout="40s" primitive mysql ocf:heartbeat:mysql \ params binary="/usr/bin/mysqld_safe" datadir="/mnt/mysql" \ op monitor interval="30s" timeout="40s" group mysql-group drbd fs mysql location group-master mysql-group \ rule $id="group-master-rule" 100: #uname eq mysql1 property $id="cib-bootstrap-options" \ dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \ cluster-infrastructure="Heartbeat" \ stonith-enabled="false" \ last-lrm-refresh="1268950841" What am I missing? _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
