[Linux-HA] Issues with Heartbeat/DRBD over Internet connection

Mike Sweetser Tue, 11 May 2010 13:35:27 -0700

Hello,

I've set up a DRBD and Heartbeat configuration communicating over an
Internet connection, rather than internal.  The servers are running CentOS
5.4, with DRBD 8.3.2 and Heartbeat 3.0.3, out of the CentOS repository.


I start seeing these in the ha-log.

ERROR: Message hist queue is filling up (500 messages in queue)

Then I see a bunch of these:

WARN: Gmain_timeout_dispatch: Dispatch function for retransmit request took
too long to execute: 20 ms (> 10 ms) (GSource: 0x1c3025c0)

And finally:

May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Killing pid 5533 with
SIGTERM
May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Killing pid 5537 with
SIGTERM
May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Killing pid 5538 with
SIGTERM
May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Killing pid 5539 with
SIGTERM
May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Killing pid 5540 with
SIGTERM
May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Emergency Shutdown(MCP
dead): Killing ourselves.

At this point, Heartbeat on MySQL1 is dead, but because it died, it didn't
let the resources go, and DRBD is still mounted on the first server, meaning
the backup can't take over.

DRBD has continued running, and the latency between servers is very low
(9ms).

Here's my ha.cf:

logfacility     local0
debug 1
debugfile /var/log/ha-debug
logfile /var/log/ha-log
node mysql1
node mysql2
keepalive 2
deadtime 60
initdead 120
warntime 15
udpport 694
ucast eth1 66.165.231.34
ucast eth1 67.218.128.19
auto_failback on
crm yes

Here's my CRM config:

node $id="23b44f0c-55fb-4b21-bf2e-81c15f28816d" mysql2
node $id="96c549d6-3e8c-4f7a-a644-cdc08dd99e41" mysql1
primitive drbd heartbeat:drbddisk \
params 1="mysql" \
op monitor interval="30s" timeout="30s"
primitive fs ocf:heartbeat:Filesystem \
params fstype="ext3" directory="/mnt/mysql" device="/dev/drbd1" \
op monitor interval="30s" timeout="40s"
primitive mysql ocf:heartbeat:mysql \
params binary="/usr/bin/mysqld_safe" datadir="/mnt/mysql" \
op monitor interval="30s" timeout="40s"
group mysql-group drbd fs mysql
location group-master mysql-group \
rule $id="group-master-rule" 100: #uname eq mysql1
property $id="cib-bootstrap-options" \
dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \
cluster-infrastructure="Heartbeat" \
stonith-enabled="false" \
last-lrm-refresh="1268950841"

What am I missing?
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Issues with Heartbeat/DRBD over Internet connection

Reply via email to