Welisson wrote: > Hi all, > > I am with the same problem, in relation to heartbeat, as it follows below in > the e-mail. > I tested handle, I increased the value of deadtime, and nothing it decided. > I would like to know, if this could be some problem in relation to kernel, > because I am using in the main o kernel 2.6.18, standard of debian etch, and > in Connective 10 (secondary) 2.6.12.2 compiled. > > What it could be in relation to the Kernel?
As Dejan said already: > This indicates one of three possible problems: flakey > communications, high load, or a kernel scheduler problems. So - Yes, it _could_ be a kernel issue. Have you ruled out the other two possible causes? If not, you should probably start there (as they are typically easier to identify / fix, and, if relevant, they MUST be fixed if you want a stable cluster anyway). If comms are clean and load is not the problem, then re-visit kernel issue. > > > Regards > > Welisson > > > Em Seg 29 Out 2007 10:53, Dejan Muhamedagic escreveu: >> Hi, >> >> On Sun, Oct 28, 2007 at 11:19:32PM -0300, [EMAIL PROTECTED] wrote: >>> Hi all. >>> >>> >>> Following i have 2 servers, settings for function of firewall, with >>> configuration. >>> >>> Server Master >>> P4 3.0HT >>> 2GB Ram >>> 4 HD (2 used system and 2 to cache squid, firewall, Shaper and BGP-4) >>> Motherboard Intel >>> >>> >>> Server Slave >>> P4 2.0 >>> 1GB Ram >>> 2 HD >>> Motherboard Intel without squid but used to firewall, shaper and BGP-4 >>> >>> what it occurs is the following one, I have heartbeat installed in the >>> two servers, and of some days for here, I am having problems with >>> heartbeat of it to fall and to come back, as it follows in log below >>> register in the main server: >>> >>> >>> Oct 22 21:10:53 gateway heartbeat[19084]: WARN: Late heartbeat: Node >>> gateway2.domain.com.br: interval 12530 ms >>> Oct 22 22:20:37 gateway heartbeat[19084]: WARN: node >>> gateway2.domain.com.br: is dead >>> Oct 22 22:20:37 gateway heartbeat[19084]: WARN: No STONITH device >>> configured. >>> Oct 22 22:20:37 gateway heartbeat[19084]: WARN: Shared disks are not >>> protected. >>> Oct 22 22:20:37 gateway heartbeat[19084]: info: Resources being >>> acquired from gateway2.domain.com.br. >>> Oct 22 22:20:37 gateway heartbeat[19084]: info: Link >>> gateway2.domain.com.br:/dev/ttyS0 dead. >>> Oct 22 22:20:38 gateway heartbeat: info: Running /etc/ha.d/rc.d/status >>> status >>> Oct 22 22:20:38 gateway heartbeat: info: /usr/lib/heartbeat/mach_down: >>> nice_failback: foreign resources acquired >>> Oct 22 22:20:42 gateway heartbeat[19084]: WARN: Cluster node >>> gateway2.domain.com.br returning after partition. >>> Oct 22 22:20:42 gateway heartbeat[19084]: WARN: Deadtime value may be >>> too small. >>> Oct 22 22:20:42 gateway heartbeat[19084]: info: See documentation for >>> information on tuning deadtime. >>> Oct 22 22:20:42 gateway heartbeat[19084]: info: Link >>> gateway2.domain.com.br:/dev/ttyS0 up. >>> Oct 22 22:20:42 gateway heartbeat[19084]: WARN: Late heartbeat: Node >>> gateway2.domain.com.br: interval 35790 ms >> This indicates one of three possible problems: flakey >> communications, high load, or a kernel scheduler problems. >> >> Thanks, >> >> Dejan >> >>> Oct 22 22:20:42 gateway heartbeat[19084]: info: Status update for node >>> gateway2.domain.com.br: status active >>> Oct 22 22:20:42 gateway heartbeat[19084]: info: mach_down takeover >>> complete. Oct 22 22:20:42 gateway heartbeat: info: mach_down takeover >>> complete for node gateway2.domain.com.br. >>> Oct 22 22:20:42 gateway heartbeat[14883]: info: Local Resource >>> acquisition completed. >>> Oct 22 22:20:42 gateway heartbeat: info: Running /etc/ha.d/rc.d/status >>> status >>> Oct 22 22:20:44 gateway heartbeat[19084]: info: Heartbeat shutdown in >>> progress. (19084) >>> Oct 22 22:20:44 gateway heartbeat[16667]: info: Giving up all HA >>> resources. Oct 22 22:20:44 gateway heartbeat: info: Releasing resource >>> group: gateway.domain.com.br 200.xxx.xxx.xxx/30/eth0 >>> 200.xxx.xxx.x6/30/eth1 200.xxx.xxx.x7/29/eth2 firewall shaper >>> Oct 22 22:20:44 gateway heartbeat: info: Running /etc/init.d/shaper stop >>> Oct 22 22:20:46 gateway heartbeat: info: Running /etc/init.d/firewall >>> stop Oct 22 22:20:46 gateway heartbeat: info: Running >>> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.x7/29/eth2 stop >>> Oct 22 22:20:47 gateway heartbeat: info: Running >>> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.x6/30/eth1 stop >>> Oct 22 22:20:47 gateway heartbeat: info: /sbin/route -n del -host >>> 200.xxx.xxx.x6 >>> Oct 22 22:20:47 gateway heartbeat: info: /sbin/ifconfig eth1:0 down >>> Oct 22 22:20:47 gateway heartbeat: info: IP Address 200.xxx.xxx.x6 >>> released Oct 22 22:20:47 gateway heartbeat: info: Running >>> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.xxx/30/eth0 stop >>> Oct 22 22:20:47 gateway heartbeat[16667]: info: All HA resources >>> relinquished. >>> Oct 22 22:20:47 gateway heartbeat[19084]: WARN: 1 lost packet(s) for >>> [gateway2.domain.com.br] [239455:239457] >>> Oct 22 22:20:47 gateway heartbeat[19084]: info: No pkts missing from >>> gateway2.domain.com.br! >>> Oct 22 22:20:48 gateway heartbeat[19084]: info: killing HBFIFO process >>> 19086 with signal 15 >>> Oct 22 22:20:48 gateway heartbeat[19084]: info: killing HBWRITE process >>> 19087 with signal 15 >>> Oct 22 22:20:48 gateway heartbeat[19084]: info: killing HBREAD process >>> 19088 with signal 15 >>> Oct 22 22:20:48 gateway heartbeat[19084]: info: Core process 19088 >>> exited. 3 remaining >>> Oct 22 22:20:48 gateway heartbeat[19084]: info: Core process 19086 >>> exited. 2 remaining >>> Oct 22 22:20:48 gateway heartbeat[19084]: info: Core process 19087 >>> exited. 1 remaining >>> Oct 22 22:20:48 gateway heartbeat[19084]: info: Heartbeat shutdown >>> complete. Oct 22 22:20:48 gateway heartbeat[19084]: info: Heartbeat >>> restart triggered. Oct 22 22:20:48 gateway heartbeat[19084]: info: >>> Restarting heartbeat. Oct 22 22:20:48 gateway heartbeat[19084]: info: >>> Performing heartbeat restart exec. >>> Oct 22 22:21:19 gateway heartbeat[19084]: info: >>> ************************** Oct 22 22:21:19 gateway heartbeat[19084]: >>> info: Configuration >>> validated. Starting heartbeat 1.2.5 >>> Oct 22 22:21:19 gateway heartbeat[19947]: info: heartbeat: version 1.2.5 >>> Oct 22 22:21:19 gateway heartbeat[19947]: info: Heartbeat generation: 23 >>> Oct 22 22:21:20 gateway heartbeat[19947]: info: Starting serial >>> heartbeat on tty /dev/ttyS0 (19200 baud) >>> Oct 22 22:21:20 gateway heartbeat[19947]: info: pid 19947 locked in >>> memory. Oct 22 22:21:20 gateway heartbeat[19947]: info: Local status now >>> set to: 'up' >>> Oct 22 22:21:21 gateway heartbeat[19949]: info: pid 19949 locked in >>> memory. Oct 22 22:21:21 gateway heartbeat[19950]: info: pid 19950 locked >>> in memory. Oct 22 22:21:21 gateway heartbeat[19951]: info: pid 19951 >>> locked in memory. Oct 22 22:21:21 gateway heartbeat[19947]: WARN: >>> string2msg_ll: node [gateway2.domain.com.br] failed authentication >>> Oct 22 22:21:22 gateway heartbeat[19947]: info: Link >>> gateway2.domain.com.br:/dev/ttyS0 up. >>> Oct 22 22:21:22 gateway heartbeat[19947]: info: Status update for node >>> gateway2.domain.com.br: status active >>> Oct 22 22:21:22 gateway heartbeat[19947]: info: Local status now set >>> to: 'active' >>> Oct 22 22:21:22 gateway heartbeat: info: Running /etc/ha.d/rc.d/status >>> status >>> Oct 22 22:21:22 gateway heartbeat[19947]: info: remote resource >>> transition completed. >>> Oct 22 22:21:22 gateway heartbeat[19947]: info: remote resource >>> transition completed. >>> Oct 22 22:21:22 gateway heartbeat[19947]: info: Local Resource >>> acquisition completed. (none) >>> Oct 22 22:21:23 gateway heartbeat[19947]: info: gateway2.domain.com.br >>> wants to go standby [foreign] >>> Oct 22 22:21:35 gateway heartbeat[19947]: info: standby: acquire >>> [foreign] resources from gateway2.domain.com.br >>> Oct 22 22:21:35 gateway heartbeat[19956]: info: acquire local HA >>> resources (standby). >>> Oct 22 22:21:35 gateway heartbeat: info: Acquiring resource group: >>> gateway.domain.com.br 200.xxx.xxx.xxx/30/eth0 200.xxx.xxx.x6/30/eth1 >>> 200.xxx.xxx.x7/29/eth2 firewall shaper >>> Oct 22 22:21:35 gateway heartbeat: info: Running >>> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.xxx/30/eth0 start >>> Oct 22 22:21:35 gateway heartbeat: info: /sbin/ifconfig eth0:0 >>> 200.xxx.xxx.xxx netmask 255.255.255.252 broadcast 200.208.220.131 >>> Oct 22 22:21:35 gateway heartbeat: info: Sending Gratuitous Arp for >>> 200.xxx.xxx.xxx on eth0:0 [eth0] >>> Oct 22 22:21:35 gateway heartbeat: /usr/lib/heartbeat/send_arp -i 1010 >>> -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-200.xxx.xxx.xxx >>> eth0 200.xxx.xxx.xxx auto 200.xxx.xxx.xxx ffffffffffff >>> Oct 22 22:21:35 gateway heartbeat: info: Running >>> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.x6/30/eth1 start >>> Oct 22 22:21:35 gateway heartbeat: info: /sbin/ifconfig eth1:0 >>> 200.xxx.xxx.x6 netmask 255.255.255.252 broadcast 200.208.223.67 >>> Oct 22 22:21:35 gateway heartbeat: info: Sending Gratuitous Arp for >>> 200.xxx.xxx.x6 on eth1:0 [eth1] >>> Oct 22 22:21:35 gateway heartbeat: /usr/lib/heartbeat/send_arp -i 1010 >>> -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-200.xxx.xxx.x6 eth1 >>> 200.xxx.xxx.x6 auto 200.xxx.xxx.x6 ffffffffffff >>> Oct 22 22:21:36 gateway heartbeat: info: Running >>> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.x7/29/eth2 start >>> Oct 22 22:21:36 gateway heartbeat: info: /sbin/ifconfig eth2:0 >>> 200.xxx.xxx.x7 netmask 255.255.255.248 broadcast 200.208.220.151 >>> Oct 22 22:21:36 gateway heartbeat: info: Sending Gratuitous Arp for >>> 200.xxx.xxx.x7 on eth2:0 [eth2] >>> Oct 22 22:21:36 gateway heartbeat: /usr/lib/heartbeat/send_arp -i 1010 >>> -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-200.xxx.xxx.x7 eth2 >>> 200.xxx.xxx.x7 auto 200.xxx.xxx.x7 ffffffffffff >>> Oct 22 22:21:36 gateway heartbeat: info: Running /etc/init.d/firewall >>> start Oct 22 22:21:36 gateway heartbeat: info: Running /etc/init.d/shaper >>> start Oct 22 22:21:41 gateway heartbeat[19956]: info: local HA resource >>> acquisition completed (standby). >>> Oct 22 22:21:41 gateway heartbeat[19947]: info: Standby resource >>> acquisition done [foreign]. >>> Oct 22 22:21:41 gateway heartbeat[19947]: info: Initial resource >>> acquisition complete (auto_failback) >>> Oct 22 22:21:41 gateway heartbeat[19947]: info: remote resource >>> transition completed. >>> >>> ---------------------------------------------------------------- >>> Conectcor - velocidade com qualidade >>> www.conectcor.com.br >>> >>> >>> >>> _______________________________________________ >>> Linux-HA mailing list >>> [email protected] >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
