On Tue, Oct 23, 2007 at 05:50:14PM -0200, [EMAIL PROTECTED] wrote:
| Boa tarde a todos 
| 
| Esse é meu primeiro Post, aqui na lista.
| 
| Seguinte tenho dois servidores GW com Heartbeat instalado. 
| O que ocorre é o seguinte, ele cai e sobe em determinados dias,
| causando uma queda temporaria na internet. E voltando novamente.

Pelo log, parece que o único canal para comunicação entre as duas máquinas
é a serial. Isso geralmente não é uma boa idéia. O interessante é que as
máquinas possam conversar por mais de uma interface e somento no caso de
não conseguir acessar o outro nodo por nenhuma destas, declará-lo falho.

Outras dúvidas vindas do log:

* WARN: Deadtime value may be too small. - Logo depois vem a recomendação
  de verificar a documentação sobre o valor de deadtime, que provavelmente
  esteja muito pequeno - ou teu intervalo de heartbeat muito grande
  comparado ao deadtime.

* heartbeat: version 1.2.5 - IIRC esta versão é antiga, mesmo para a série
  1.x. É bom verificar se não há nenhuma atualização ou correção. Lembro de
  ter visto há algum tempo na lista linux-ha-dev algo sobre problemas com a
  serial.

* Estás usando a serial como raw? (não vi referências a ppp). É
  interessante verificar se o shaper e o firewall não estão mexendo em nada
  com relação ao canal da serial.

* Late heartbeat: - vários heartbeats estão levando de 12 a 35s para
  chegar. Revisa teu cabo serial e mesmo a configuração das seriais (ambas
  estão na mesma velocidade?).

* No trecho abaixo foi forçada, manualmente, a migração dos recursos:

        | Oct 22 22:21:23 gateway heartbeat[19947]: info: gateway2.domain.com.br
        | wants to go standby [foreign]
        | Oct 22 22:21:35 gateway heartbeat[19947]: info: standby: acquire
        | [foreign] resources from gateway2.domain.com.br
        | Oct 22 22:21:35 gateway heartbeat[19956]: info: acquire local HA
        | resources (standby).


Luis


| A configuração primario é Debian 4 Etch, e da segunda um Conectiva 10,
| a ligação de uma maquina para a outra é feita via Serial.
|  Esse é a parte do meu Log
| 
| Oct 22 21:10:53 gateway heartbeat[19084]: WARN: Late heartbeat: Node
| gateway2.domain.com.br: interval 12530 ms
| Oct 22 22:20:37 gateway heartbeat[19084]: WARN: node
| gateway2.domain.com.br: is dead
| Oct 22 22:20:37 gateway heartbeat[19084]: WARN: No STONITH device
| configured.
| Oct 22 22:20:37 gateway heartbeat[19084]: WARN: Shared disks are not
| protected.
| Oct 22 22:20:37 gateway heartbeat[19084]: info: Resources being
| acquired from gateway2.domain.com.br.
| Oct 22 22:20:37 gateway heartbeat[19084]: info: Link
| gateway2.domain.com.br:/dev/ttyS0 dead.
| Oct 22 22:20:38 gateway heartbeat: info: Running /etc/ha.d/rc.d/status
| status
| Oct 22 22:20:38 gateway heartbeat: info: /usr/lib/heartbeat/mach_down:
| nice_failback: foreign resources acquired
| Oct 22 22:20:42 gateway heartbeat[19084]: WARN: Cluster node
| gateway2.domain.com.br returning after partition.
| Oct 22 22:20:42 gateway heartbeat[19084]: WARN: Deadtime value may be
| too small.
| Oct 22 22:20:42 gateway heartbeat[19084]: info: See documentation for
| information on tuning deadtime.
| Oct 22 22:20:42 gateway heartbeat[19084]: info: Link
| gateway2.domain.com.br:/dev/ttyS0 up.
| Oct 22 22:20:42 gateway heartbeat[19084]: WARN: Late heartbeat: Node
| gateway2.domain.com.br: interval 35790 ms
| Oct 22 22:20:42 gateway heartbeat[19084]: info: Status update for node
| gateway2.domain.com.br: status active
| Oct 22 22:20:42 gateway heartbeat[19084]: info: mach_down takeover
| complete.
| Oct 22 22:20:42 gateway heartbeat: info: mach_down takeover complete
| for node gateway2.domain.com.br.
| Oct 22 22:20:42 gateway heartbeat[14883]: info: Local Resource
| acquisition completed.
| Oct 22 22:20:42 gateway heartbeat: info: Running /etc/ha.d/rc.d/status
| status
| Oct 22 22:20:44 gateway heartbeat[19084]: info: Heartbeat shutdown in
| progress. (19084)
| Oct 22 22:20:44 gateway heartbeat[16667]: info: Giving up all HA
| resources.
| Oct 22 22:20:44 gateway heartbeat: info: Releasing resource group:
| gateway.domain.com.br 200.xxx.xxx.xxx/30/eth0 200.xxx.xxx.x6/30/eth1
| 200.xxx.xxx.x7/29/eth2 firewall shaper
| Oct 22 22:20:44 gateway heartbeat: info: Running /etc/init.d/shaper  stop
| Oct 22 22:20:46 gateway heartbeat: info: Running /etc/init.d/firewall
|  stop
| Oct 22 22:20:46 gateway heartbeat: info: Running
| /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.x7/29/eth2 stop
| Oct 22 22:20:47 gateway heartbeat: info: Running
| /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.x6/30/eth1 stop
| Oct 22 22:20:47 gateway heartbeat: info: /sbin/route -n del -host
| 200.xxx.xxx.x6
| Oct 22 22:20:47 gateway heartbeat: info: /sbin/ifconfig eth1:0 down
| Oct 22 22:20:47 gateway heartbeat: info: IP Address 200.xxx.xxx.x6
| released
| Oct 22 22:20:47 gateway heartbeat: info: Running
| /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.xxx/30/eth0 stop
| Oct 22 22:20:47 gateway heartbeat[16667]: info: All HA resources
| relinquished.
| Oct 22 22:20:47 gateway heartbeat[19084]: WARN: 1 lost packet(s) for
| [gateway2.domain.com.br] [239455:239457]
| Oct 22 22:20:47 gateway heartbeat[19084]: info: No pkts missing from
| gateway2.domain.com.br!
| Oct 22 22:20:48 gateway heartbeat[19084]: info: killing HBFIFO process
| 19086 with signal 15
| Oct 22 22:20:48 gateway heartbeat[19084]: info: killing HBWRITE
| process 19087 with signal 15
| Oct 22 22:20:48 gateway heartbeat[19084]: info: killing HBREAD process
| 19088 with signal 15
| Oct 22 22:20:48 gateway heartbeat[19084]: info: Core process 19088
| exited. 3 remaining
| Oct 22 22:20:48 gateway heartbeat[19084]: info: Core process 19086
| exited. 2 remaining
| Oct 22 22:20:48 gateway heartbeat[19084]: info: Core process 19087
| exited. 1 remaining
| Oct 22 22:20:48 gateway heartbeat[19084]: info: Heartbeat shutdown
| complete.
| Oct 22 22:20:48 gateway heartbeat[19084]: info: Heartbeat restart
| triggered.
| Oct 22 22:20:48 gateway heartbeat[19084]: info: Restarting heartbeat.
| Oct 22 22:20:48 gateway heartbeat[19084]: info: Performing heartbeat
| restart exec.
| Oct 22 22:21:19 gateway heartbeat[19084]: info: **************************
| Oct 22 22:21:19 gateway heartbeat[19084]: info: Configuration
| validated. Starting heartbeat 1.2.5
| Oct 22 22:21:19 gateway heartbeat[19947]: info: heartbeat: version 1.2.5
| Oct 22 22:21:19 gateway heartbeat[19947]: info: Heartbeat generation: 23
| Oct 22 22:21:20 gateway heartbeat[19947]: info: Starting serial
| heartbeat on tty /dev/ttyS0 (19200 baud)
| Oct 22 22:21:20 gateway heartbeat[19947]: info: pid 19947 locked in
| memory.
| Oct 22 22:21:20 gateway heartbeat[19947]: info: Local status now set
| to: 'up'
| Oct 22 22:21:21 gateway heartbeat[19949]: info: pid 19949 locked in
| memory.
| Oct 22 22:21:21 gateway heartbeat[19950]: info: pid 19950 locked in
| memory.
| Oct 22 22:21:21 gateway heartbeat[19951]: info: pid 19951 locked in
| memory.
| Oct 22 22:21:21 gateway heartbeat[19947]: WARN: string2msg_ll: node
| [gateway2.domain.com.br] failed authentication
| Oct 22 22:21:22 gateway heartbeat[19947]: info: Link
| gateway2.domain.com.br:/dev/ttyS0 up.
| Oct 22 22:21:22 gateway heartbeat[19947]: info: Status update for node
| gateway2.domain.com.br: status active
| Oct 22 22:21:22 gateway heartbeat[19947]: info: Local status now set
| to: 'active'
| Oct 22 22:21:22 gateway heartbeat: info: Running /etc/ha.d/rc.d/status
| status
| Oct 22 22:21:22 gateway heartbeat[19947]: info: remote resource
| transition completed.
| Oct 22 22:21:22 gateway heartbeat[19947]: info: remote resource
| transition completed.
| Oct 22 22:21:22 gateway heartbeat[19947]: info: Local Resource
| acquisition completed. (none)
| Oct 22 22:21:23 gateway heartbeat[19947]: info: gateway2.domain.com.br
| wants to go standby [foreign]
| Oct 22 22:21:35 gateway heartbeat[19947]: info: standby: acquire
| [foreign] resources from gateway2.domain.com.br
| Oct 22 22:21:35 gateway heartbeat[19956]: info: acquire local HA
| resources (standby).
| Oct 22 22:21:35 gateway heartbeat: info: Acquiring resource group:
| gateway.domain.com.br 200.xxx.xxx.xxx/30/eth0 200.xxx.xxx.x6/30/eth1
| 200.xxx.xxx.x7/29/eth2 firewall shaper
| Oct 22 22:21:35 gateway heartbeat: info: Running
| /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.xxx/30/eth0 start
| Oct 22 22:21:35 gateway heartbeat: info: /sbin/ifconfig eth0:0
| 200.xxx.xxx.xxx netmask 255.255.255.252      broadcast 200.208.220.131
| Oct 22 22:21:35 gateway heartbeat: info: Sending Gratuitous Arp for
| 200.xxx.xxx.xxx on eth0:0 [eth0]
| Oct 22 22:21:35 gateway heartbeat: /usr/lib/heartbeat/send_arp -i 1010
| -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-200.xxx.xxx.xxx
| eth0 200.xxx.xxx.xxx auto 200.xxx.xxx.xxx ffffffffffff
| Oct 22 22:21:35 gateway heartbeat: info: Running
| /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.x6/30/eth1 start
| Oct 22 22:21:35 gateway heartbeat: info: /sbin/ifconfig eth1:0
| 200.xxx.xxx.x6 netmask 255.255.255.252 broadcast 200.208.223.67
| Oct 22 22:21:35 gateway heartbeat: info: Sending Gratuitous Arp for
| 200.xxx.xxx.x6 on eth1:0 [eth1]
| Oct 22 22:21:35 gateway heartbeat: /usr/lib/heartbeat/send_arp -i 1010
| -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-200.xxx.xxx.x6
| eth1 200.xxx.xxx.x6 auto 200.xxx.xxx.x6 ffffffffffff
| Oct 22 22:21:36 gateway heartbeat: info: Running
| /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.x7/29/eth2 start
| Oct 22 22:21:36 gateway heartbeat: info: /sbin/ifconfig eth2:0
| 200.xxx.xxx.x7 netmask 255.255.255.248      broadcast 200.208.220.151
| Oct 22 22:21:36 gateway heartbeat: info: Sending Gratuitous Arp for
| 200.xxx.xxx.x7 on eth2:0 [eth2]
| Oct 22 22:21:36 gateway heartbeat: /usr/lib/heartbeat/send_arp -i 1010
| -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-200.xxx.xxx.x7
| eth2 200.xxx.xxx.x7 auto 200.xxx.xxx.x7 ffffffffffff
| Oct 22 22:21:36 gateway heartbeat: info: Running /etc/init.d/firewall
|  start
| Oct 22 22:21:36 gateway heartbeat: info: Running /etc/init.d/shaper  start
| Oct 22 22:21:41 gateway heartbeat[19956]: info: local HA resource
| acquisition completed (standby).
| Oct 22 22:21:41 gateway heartbeat[19947]: info: Standby resource
| acquisition done [foreign].
| Oct 22 22:21:41 gateway heartbeat[19947]: info: Initial resource
| acquisition complete (auto_failback)
| Oct 22 22:21:41 gateway heartbeat[19947]: info: remote resource
| transition completed.
| 

-- 
[ Luis Claudio R. Goncalves                    Bass - Gospel - RT ]
[ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9  2696 7203 D980 A448 C8F8 ]

_______________________________________________
Linux-HA mailing list
[email protected]
http://listas.linuxchix.org.br/mailman/listinfo/linux-ha

Responder a