Greetings,

I have been having a problem with the heartbeat 2.0.8 installation on one of
my clusters.  Approximately once a day, I was getting 2 late heartbeats from
my primary node, followed by my backup node thinking my primary was dead,
thus causing split brain.  I have subsequently stopped heartbeat on my
backup node and am still getting the late heartbeat notices on the primary
node.  Attached to this message is the log when this event occurs.  I have a
feeling the "dispatch function" issues have something to do with it.  What
is strange is that I have other machines with nearly identical
configurations and am not having this issue.  Could it be an issue with the
network cards or cable?

Both machines are running heartbeat 2.0.8 on FreeBSD 6.2-RELEASE.  They are
connected via crossover cable:

/var/log/messages during the late heartbeat occurances:
Jul  3 09:46:48 sparky1 heartbeat: [1454]: WARN: Late heartbeat: Node
sparky1.domainit.com: interval 24921 ms
Jul  3 09:46:48 sparky1 heartbeat: [1454]: WARN: Gmain_timeout_dispatch:
Dispatch function for send local status took too long to execute: 19921 ms
(> 2510 ms) (GSource: 0x5e2818)
Jul  3 09:46:48 sparky1 heartbeat: [1454]: WARN: Gmain_timeout_dispatch:
Dispatch function for send local status was delayed 14921 ms (> 2510 ms)
before being called (GSource: 0x5e2818)
Jul  3 09:46:48 sparky1 heartbeat: [1454]: WARN: Gmain_timeout_dispatch:
Dispatch function for check for signals was delayed 18976 ms (> 2510 ms)
before being called (GSource: 0x5e3018)
Jul  3 09:46:49 sparky1 heartbeat: [1454]: WARN: G_CH_dispatch_int: Dispatch
function for read child took too long to execute: 54 ms (> 50 ms) (GSource:
0x5e2618)
Jul  3 09:46:49 sparky1 heartbeat: [1454]: WARN: Gmain_timeout_dispatch:
Dispatch function for client audit was delayed 15125 ms (> 5000 ms) before
being called (GSource: 0x5e2e18)
Jul  3 09:47:20 sparky1 heartbeat: [1454]: WARN: Late heartbeat: Node
sparky1.domainit.com: interval 24851 ms
Jul  3 09:47:20 sparky1 heartbeat: [1454]: WARN: Gmain_timeout_dispatch:
Dispatch function for send local status took too long to execute: 21843 ms
(> 2510 ms) (GSource: 0x5e2818)
Jul  3 09:47:20 sparky1 heartbeat: [1454]: WARN: Gmain_timeout_dispatch:
Dispatch function for send local status was delayed 16843 ms (> 2510 ms)
before being called (GSource: 0x5e2818)
Jul  3 09:47:20 sparky1 heartbeat: [1454]: WARN: Gmain_timeout_dispatch:
Dispatch function for check for signals was delayed 19859 ms (> 2510 ms)
before being called (GSource: 0x5e3018)
Jul  3 09:47:20 sparky1 heartbeat: [1454]: WARN: Gmain_timeout_dispatch:
Dispatch function for update msgfree count was delayed 21804 ms (> 20000 ms)
before being called (GSource: 0x5e3218)
Jul  3 09:47:20 sparky1 heartbeat: [1454]: WARN: Gmain_timeout_dispatch:
Dispatch function for client audit was delayed 13796 ms (> 5000 ms) before
being called (GSource: 0x5e2e18)

my ha.cf:

bcast em0
logfacility local7
keepalive 5
warntime 10
deadtime 20
initdead 40
auto_failback off
node sparky1.domainit.com
node sparky2.domainit.com
respawn hacluster /usr/local/lib/heartbeat/ipfail
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to