Welisson wrote:
> Hi all,
> 
> I am with the same problem, in relation to heartbeat, as it follows below in 
> the e-mail.
> I tested handle, I increased the value of deadtime, and nothing it decided.
> I would like to know, if this could be some problem in relation to kernel, 
> because I am using in the main o kernel 2.6.18, standard of debian etch, and 
> in Connective 10 (secondary) 2.6.12.2  compiled.
> 
> What it could be in relation to the Kernel?

As Dejan said already:

> This indicates one of three possible problems: flakey
> communications, high load, or a kernel scheduler problems.

So - Yes, it _could_ be a kernel issue. Have you ruled out the other two
possible causes? If not, you should probably start there (as they are
typically easier to identify / fix, and, if relevant, they MUST be fixed
if you want a stable cluster anyway). If comms are clean and load is not
the problem, then re-visit kernel issue.


> 
> 
> Regards
> 
> Welisson
> 
> 
> Em Seg 29 Out 2007 10:53, Dejan Muhamedagic escreveu:
>> Hi,
>>
>> On Sun, Oct 28, 2007 at 11:19:32PM -0300, [EMAIL PROTECTED] wrote:
>>> Hi all.
>>>
>>>
>>> Following  i have  2 servers, settings for function of firewall, with
>>> configuration.
>>>
>>> Server Master
>>> P4 3.0HT
>>> 2GB Ram
>>> 4 HD (2 used system and 2 to cache squid, firewall, Shaper and BGP-4)
>>> Motherboard Intel
>>>
>>>
>>> Server Slave
>>> P4 2.0
>>> 1GB Ram
>>> 2 HD
>>> Motherboard Intel without squid but used to firewall, shaper and BGP-4
>>>
>>> what it occurs is the following one, I have heartbeat installed in the
>>> two servers, and of some days for here, I am having problems with
>>> heartbeat of it to fall and to come back, as it follows in log below
>>> register in the main server:
>>>
>>>
>>> Oct 22 21:10:53 gateway heartbeat[19084]: WARN: Late heartbeat: Node
>>> gateway2.domain.com.br: interval 12530 ms
>>> Oct 22 22:20:37 gateway heartbeat[19084]: WARN: node
>>> gateway2.domain.com.br: is dead
>>> Oct 22 22:20:37 gateway heartbeat[19084]: WARN: No STONITH device
>>> configured.
>>> Oct 22 22:20:37 gateway heartbeat[19084]: WARN: Shared disks are not
>>> protected.
>>> Oct 22 22:20:37 gateway heartbeat[19084]: info: Resources being
>>> acquired from gateway2.domain.com.br.
>>> Oct 22 22:20:37 gateway heartbeat[19084]: info: Link
>>> gateway2.domain.com.br:/dev/ttyS0 dead.
>>> Oct 22 22:20:38 gateway heartbeat: info: Running /etc/ha.d/rc.d/status
>>> status
>>> Oct 22 22:20:38 gateway heartbeat: info: /usr/lib/heartbeat/mach_down:
>>> nice_failback: foreign resources acquired
>>> Oct 22 22:20:42 gateway heartbeat[19084]: WARN: Cluster node
>>> gateway2.domain.com.br returning after partition.
>>> Oct 22 22:20:42 gateway heartbeat[19084]: WARN: Deadtime value may be
>>> too small.
>>> Oct 22 22:20:42 gateway heartbeat[19084]: info: See documentation for
>>> information on tuning deadtime.
>>> Oct 22 22:20:42 gateway heartbeat[19084]: info: Link
>>> gateway2.domain.com.br:/dev/ttyS0 up.
>>> Oct 22 22:20:42 gateway heartbeat[19084]: WARN: Late heartbeat: Node
>>> gateway2.domain.com.br: interval 35790 ms
>> This indicates one of three possible problems: flakey
>> communications, high load, or a kernel scheduler problems.
>>
>> Thanks,
>>
>> Dejan
>>
>>> Oct 22 22:20:42 gateway heartbeat[19084]: info: Status update for node
>>> gateway2.domain.com.br: status active
>>> Oct 22 22:20:42 gateway heartbeat[19084]: info: mach_down takeover
>>> complete. Oct 22 22:20:42 gateway heartbeat: info: mach_down takeover
>>> complete for node gateway2.domain.com.br.
>>> Oct 22 22:20:42 gateway heartbeat[14883]: info: Local Resource
>>> acquisition completed.
>>> Oct 22 22:20:42 gateway heartbeat: info: Running /etc/ha.d/rc.d/status
>>> status
>>> Oct 22 22:20:44 gateway heartbeat[19084]: info: Heartbeat shutdown in
>>> progress. (19084)
>>> Oct 22 22:20:44 gateway heartbeat[16667]: info: Giving up all HA
>>> resources. Oct 22 22:20:44 gateway heartbeat: info: Releasing resource
>>> group: gateway.domain.com.br 200.xxx.xxx.xxx/30/eth0
>>> 200.xxx.xxx.x6/30/eth1 200.xxx.xxx.x7/29/eth2 firewall shaper
>>> Oct 22 22:20:44 gateway heartbeat: info: Running /etc/init.d/shaper stop
>>> Oct 22 22:20:46 gateway heartbeat: info: Running /etc/init.d/firewall
>>> stop Oct 22 22:20:46 gateway heartbeat: info: Running
>>> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.x7/29/eth2 stop
>>> Oct 22 22:20:47 gateway heartbeat: info: Running
>>> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.x6/30/eth1 stop
>>> Oct 22 22:20:47 gateway heartbeat: info: /sbin/route -n del -host
>>> 200.xxx.xxx.x6
>>> Oct 22 22:20:47 gateway heartbeat: info: /sbin/ifconfig eth1:0 down
>>> Oct 22 22:20:47 gateway heartbeat: info: IP Address 200.xxx.xxx.x6
>>> released Oct 22 22:20:47 gateway heartbeat: info: Running
>>> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.xxx/30/eth0 stop
>>> Oct 22 22:20:47 gateway heartbeat[16667]: info: All HA resources
>>> relinquished.
>>> Oct 22 22:20:47 gateway heartbeat[19084]: WARN: 1 lost packet(s) for
>>> [gateway2.domain.com.br] [239455:239457]
>>> Oct 22 22:20:47 gateway heartbeat[19084]: info: No pkts missing from
>>> gateway2.domain.com.br!
>>> Oct 22 22:20:48 gateway heartbeat[19084]: info: killing HBFIFO process
>>> 19086 with signal 15
>>> Oct 22 22:20:48 gateway heartbeat[19084]: info: killing HBWRITE process
>>> 19087 with signal 15
>>> Oct 22 22:20:48 gateway heartbeat[19084]: info: killing HBREAD process
>>> 19088 with signal 15
>>> Oct 22 22:20:48 gateway heartbeat[19084]: info: Core process 19088
>>> exited. 3 remaining
>>> Oct 22 22:20:48 gateway heartbeat[19084]: info: Core process 19086
>>> exited. 2 remaining
>>> Oct 22 22:20:48 gateway heartbeat[19084]: info: Core process 19087
>>> exited. 1 remaining
>>> Oct 22 22:20:48 gateway heartbeat[19084]: info: Heartbeat shutdown
>>> complete. Oct 22 22:20:48 gateway heartbeat[19084]: info: Heartbeat
>>> restart triggered. Oct 22 22:20:48 gateway heartbeat[19084]: info:
>>> Restarting heartbeat. Oct 22 22:20:48 gateway heartbeat[19084]: info:
>>> Performing heartbeat restart exec.
>>> Oct 22 22:21:19 gateway heartbeat[19084]: info:
>>> ************************** Oct 22 22:21:19 gateway heartbeat[19084]:
>>> info: Configuration
>>> validated. Starting heartbeat 1.2.5
>>> Oct 22 22:21:19 gateway heartbeat[19947]: info: heartbeat: version 1.2.5
>>> Oct 22 22:21:19 gateway heartbeat[19947]: info: Heartbeat generation: 23
>>> Oct 22 22:21:20 gateway heartbeat[19947]: info: Starting serial
>>> heartbeat on tty /dev/ttyS0 (19200 baud)
>>> Oct 22 22:21:20 gateway heartbeat[19947]: info: pid 19947 locked in
>>> memory. Oct 22 22:21:20 gateway heartbeat[19947]: info: Local status now
>>> set to: 'up'
>>> Oct 22 22:21:21 gateway heartbeat[19949]: info: pid 19949 locked in
>>> memory. Oct 22 22:21:21 gateway heartbeat[19950]: info: pid 19950 locked
>>> in memory. Oct 22 22:21:21 gateway heartbeat[19951]: info: pid 19951
>>> locked in memory. Oct 22 22:21:21 gateway heartbeat[19947]: WARN:
>>> string2msg_ll: node [gateway2.domain.com.br] failed authentication
>>> Oct 22 22:21:22 gateway heartbeat[19947]: info: Link
>>> gateway2.domain.com.br:/dev/ttyS0 up.
>>> Oct 22 22:21:22 gateway heartbeat[19947]: info: Status update for node
>>> gateway2.domain.com.br: status active
>>> Oct 22 22:21:22 gateway heartbeat[19947]: info: Local status now set
>>> to: 'active'
>>> Oct 22 22:21:22 gateway heartbeat: info: Running /etc/ha.d/rc.d/status
>>> status
>>> Oct 22 22:21:22 gateway heartbeat[19947]: info: remote resource
>>> transition completed.
>>> Oct 22 22:21:22 gateway heartbeat[19947]: info: remote resource
>>> transition completed.
>>> Oct 22 22:21:22 gateway heartbeat[19947]: info: Local Resource
>>> acquisition completed. (none)
>>> Oct 22 22:21:23 gateway heartbeat[19947]: info: gateway2.domain.com.br
>>> wants to go standby [foreign]
>>> Oct 22 22:21:35 gateway heartbeat[19947]: info: standby: acquire
>>> [foreign] resources from gateway2.domain.com.br
>>> Oct 22 22:21:35 gateway heartbeat[19956]: info: acquire local HA
>>> resources (standby).
>>> Oct 22 22:21:35 gateway heartbeat: info: Acquiring resource group:
>>> gateway.domain.com.br 200.xxx.xxx.xxx/30/eth0 200.xxx.xxx.x6/30/eth1
>>> 200.xxx.xxx.x7/29/eth2 firewall shaper
>>> Oct 22 22:21:35 gateway heartbeat: info: Running
>>> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.xxx/30/eth0 start
>>> Oct 22 22:21:35 gateway heartbeat: info: /sbin/ifconfig eth0:0
>>> 200.xxx.xxx.xxx netmask 255.255.255.252 broadcast 200.208.220.131
>>> Oct 22 22:21:35 gateway heartbeat: info: Sending Gratuitous Arp for
>>> 200.xxx.xxx.xxx on eth0:0 [eth0]
>>> Oct 22 22:21:35 gateway heartbeat: /usr/lib/heartbeat/send_arp -i 1010
>>> -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-200.xxx.xxx.xxx
>>> eth0 200.xxx.xxx.xxx auto 200.xxx.xxx.xxx ffffffffffff
>>> Oct 22 22:21:35 gateway heartbeat: info: Running
>>> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.x6/30/eth1 start
>>> Oct 22 22:21:35 gateway heartbeat: info: /sbin/ifconfig eth1:0
>>> 200.xxx.xxx.x6 netmask 255.255.255.252 broadcast 200.208.223.67
>>> Oct 22 22:21:35 gateway heartbeat: info: Sending Gratuitous Arp for
>>> 200.xxx.xxx.x6 on eth1:0 [eth1]
>>> Oct 22 22:21:35 gateway heartbeat: /usr/lib/heartbeat/send_arp -i 1010
>>> -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-200.xxx.xxx.x6 eth1
>>> 200.xxx.xxx.x6 auto 200.xxx.xxx.x6 ffffffffffff
>>> Oct 22 22:21:36 gateway heartbeat: info: Running
>>> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.x7/29/eth2 start
>>> Oct 22 22:21:36 gateway heartbeat: info: /sbin/ifconfig eth2:0
>>> 200.xxx.xxx.x7 netmask 255.255.255.248 broadcast 200.208.220.151
>>> Oct 22 22:21:36 gateway heartbeat: info: Sending Gratuitous Arp for
>>> 200.xxx.xxx.x7 on eth2:0 [eth2]
>>> Oct 22 22:21:36 gateway heartbeat: /usr/lib/heartbeat/send_arp -i 1010
>>> -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-200.xxx.xxx.x7 eth2
>>> 200.xxx.xxx.x7 auto 200.xxx.xxx.x7 ffffffffffff
>>> Oct 22 22:21:36 gateway heartbeat: info: Running /etc/init.d/firewall
>>> start Oct 22 22:21:36 gateway heartbeat: info: Running /etc/init.d/shaper
>>> start Oct 22 22:21:41 gateway heartbeat[19956]: info: local HA resource
>>> acquisition completed (standby).
>>> Oct 22 22:21:41 gateway heartbeat[19947]: info: Standby resource
>>> acquisition done [foreign].
>>> Oct 22 22:21:41 gateway heartbeat[19947]: info: Initial resource
>>> acquisition complete (auto_failback)
>>> Oct 22 22:21:41 gateway heartbeat[19947]: info: remote resource
>>> transition completed.
>>>
>>> ----------------------------------------------------------------
>>> Conectcor - velocidade com qualidade
>>> www.conectcor.com.br
>>>
>>>
>>>
>>> _______________________________________________
>>> Linux-HA mailing list
>>> [email protected]
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to