On Tue, May 11, 2010 at 10:35 PM, Mike Sweetser
<[email protected]> wrote:
> Hello,
>
> I've set up a DRBD and Heartbeat configuration communicating over an
> Internet connection, rather than internal.  The servers are running CentOS
> 5.4, with DRBD 8.3.2 and Heartbeat 3.0.3, out of the CentOS repository.
>
> I start seeing these in the ha-log.
>
> ERROR: Message hist queue is filling up (500 messages in queue)
>
> Then I see a bunch of these:
>
> WARN: Gmain_timeout_dispatch: Dispatch function for retransmit request took
> too long to execute: 20 ms (> 10 ms) (GSource: 0x1c3025c0)
>
> And finally:
>
> May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Killing pid 5533 with
> SIGTERM
> May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Killing pid 5537 with
> SIGTERM
> May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Killing pid 5538 with
> SIGTERM
> May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Killing pid 5539 with
> SIGTERM
> May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Killing pid 5540 with
> SIGTERM
> May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Emergency Shutdown(MCP
> dead): Killing ourselves.
>
> At this point, Heartbeat on MySQL1 is dead, but because it died, it didn't
> let the resources go, and DRBD is still mounted on the first server, meaning
> the backup can't take over.
>
> DRBD has continued running, and the latency between servers is very low
> (9ms).
>
> Here's my ha.cf:
>
> logfacility     local0
> debug 1
> debugfile /var/log/ha-debug
> logfile /var/log/ha-log
> node mysql1
> node mysql2
> keepalive 2
> deadtime 60
> initdead 120
> warntime 15
> udpport 694
> ucast eth1 66.165.231.34
> ucast eth1 67.218.128.19
> auto_failback on
> crm yes
>
> Here's my CRM config:
>
> node $id="23b44f0c-55fb-4b21-bf2e-81c15f28816d" mysql2
> node $id="96c549d6-3e8c-4f7a-a644-cdc08dd99e41" mysql1
> primitive drbd heartbeat:drbddisk \
> params 1="mysql" \
> op monitor interval="30s" timeout="30s"
> primitive fs ocf:heartbeat:Filesystem \
> params fstype="ext3" directory="/mnt/mysql" device="/dev/drbd1" \
> op monitor interval="30s" timeout="40s"
> primitive mysql ocf:heartbeat:mysql \
> params binary="/usr/bin/mysqld_safe" datadir="/mnt/mysql" \
> op monitor interval="30s" timeout="40s"
> group mysql-group drbd fs mysql
> location group-master mysql-group \
> rule $id="group-master-rule" 100: #uname eq mysql1
> property $id="cib-bootstrap-options" \
> dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \
> cluster-infrastructure="Heartbeat" \
> stonith-enabled="false" \
> last-lrm-refresh="1268950841"
>
> What am I missing?

A reliable (and fast) internet connection combined with very
aggressive timeouts in ha.cf
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to