Greetings, I have been having a problem with the heartbeat 2.0.8 installation on one of my clusters. Approximately once a day, I was getting 2 late heartbeats from my primary node, followed by my backup node thinking my primary was dead, thus causing split brain. I have subsequently stopped heartbeat on my backup node and am still getting the late heartbeat notices on the primary node. Attached to this message is the log when this event occurs. I have a feeling the "dispatch function" issues have something to do with it. What is strange is that I have other machines with nearly identical configurations and am not having this issue. Could it be an issue with the network cards or cable?
Both machines are running heartbeat 2.0.8 on FreeBSD 6.2-RELEASE. They are connected via crossover cable: /var/log/messages during the late heartbeat occurances: Jul 3 09:46:48 sparky1 heartbeat: [1454]: WARN: Late heartbeat: Node sparky1.domainit.com: interval 24921 ms Jul 3 09:46:48 sparky1 heartbeat: [1454]: WARN: Gmain_timeout_dispatch: Dispatch function for send local status took too long to execute: 19921 ms (> 2510 ms) (GSource: 0x5e2818) Jul 3 09:46:48 sparky1 heartbeat: [1454]: WARN: Gmain_timeout_dispatch: Dispatch function for send local status was delayed 14921 ms (> 2510 ms) before being called (GSource: 0x5e2818) Jul 3 09:46:48 sparky1 heartbeat: [1454]: WARN: Gmain_timeout_dispatch: Dispatch function for check for signals was delayed 18976 ms (> 2510 ms) before being called (GSource: 0x5e3018) Jul 3 09:46:49 sparky1 heartbeat: [1454]: WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 54 ms (> 50 ms) (GSource: 0x5e2618) Jul 3 09:46:49 sparky1 heartbeat: [1454]: WARN: Gmain_timeout_dispatch: Dispatch function for client audit was delayed 15125 ms (> 5000 ms) before being called (GSource: 0x5e2e18) Jul 3 09:47:20 sparky1 heartbeat: [1454]: WARN: Late heartbeat: Node sparky1.domainit.com: interval 24851 ms Jul 3 09:47:20 sparky1 heartbeat: [1454]: WARN: Gmain_timeout_dispatch: Dispatch function for send local status took too long to execute: 21843 ms (> 2510 ms) (GSource: 0x5e2818) Jul 3 09:47:20 sparky1 heartbeat: [1454]: WARN: Gmain_timeout_dispatch: Dispatch function for send local status was delayed 16843 ms (> 2510 ms) before being called (GSource: 0x5e2818) Jul 3 09:47:20 sparky1 heartbeat: [1454]: WARN: Gmain_timeout_dispatch: Dispatch function for check for signals was delayed 19859 ms (> 2510 ms) before being called (GSource: 0x5e3018) Jul 3 09:47:20 sparky1 heartbeat: [1454]: WARN: Gmain_timeout_dispatch: Dispatch function for update msgfree count was delayed 21804 ms (> 20000 ms) before being called (GSource: 0x5e3218) Jul 3 09:47:20 sparky1 heartbeat: [1454]: WARN: Gmain_timeout_dispatch: Dispatch function for client audit was delayed 13796 ms (> 5000 ms) before being called (GSource: 0x5e2e18) my ha.cf: bcast em0 logfacility local7 keepalive 5 warntime 10 deadtime 20 initdead 40 auto_failback off node sparky1.domainit.com node sparky2.domainit.com respawn hacluster /usr/local/lib/heartbeat/ipfail _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
