Re: [Linux-HA] Heartbeat no nodo 2 cai

Jeronimo Zucco Thu, 15 Feb 2007 11:01:23 -0800

    Sim, é isso mesmo, estou replicando com drbd uma base zeo. Estou 
fazendo proxy reverso com squid (que também será redundante) através de 
diversas máquinas com zope/plone que acessam essa base zeo. Podemos 
trocar algumas figurinhas sobre a estrura em PVT, se você estiver 
interessado, gostaria de saber como está sendo essa sua experiência com 
essa estrutura (performance, problemas comuns, etc). :-)


    Sobre o problema, obrigado pela sua ajuda e pela do Luis. O que eu 
fiz para resolver foi o seguinte: fui no /etc/ha.d e dei um rm -rf e dei 
um "make install" novamente (compilei a partir do fonte do heartbeat 
2.0.8). Depois configurei novamente o ha.cf, haresources e authkeys, aí 
voltou tudo a funcionar. Têm coisas que é melhor começar do zero de novo 
do que ficar tentando descobrir o problema. Pelo o que eu pude perceber, 
o heartbeat é bem sensível a algum erro no arquivo de configuração e ele 
não te mostra onde está esse erro, ele simplesmente pára de funcionar. 
Mas agora já está resolvido.

    Obrigado novamente pela ajuda.

Ronaldo Santos escreveu:
> Jeronimo, percebi pelo seu arquivo de HARESOURCES que vc está montando 
> um ambiente de balanceamento de carga usando Zeo/Zope.
> Aqui no Núcleo de Pesquisa que participo nos montamos um ambiente 
> semelhante.
>
> Vamos lá, quanto seu erro eu acho que é algo relativo a transmissão 
> UDP usada pelo Heartbeat, tente mudar a porta.
>
> Ok
>
> 2007/2/15, Jeronimo Zucco <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>:
>
>         Olá a todos.
>
>         Estou enfrentando problema com o heartbeat. O nodo 2 simplesmente
>     cai depois de alguns segundos, sem nenhuma explicação aparente. Segue
>     abaixo o log do nodo 2:
>
>     Feb 15 10:48:06 odin2 heartbeat: [3883]: info: Enabling logging
>     daemon
>     Feb 15 10:48:06 odin2 heartbeat: [3883]: info: logfile and debug file
>     are those specified in logd config file (default /etc/logd.cf)
>     Feb 15 10:48:06 odin2 heartbeat: [3883]: info: Syntax: apiauth client
>     [uid=uidlist] [gid=gidlist]
>     Feb 15 10:48:06 odin2 heartbeat: [3883]: info: Where uidlist is a
>     comma-separated list of uids,
>     Feb 15 10:48:06 odin2 heartbeat: [3883]: info: and gidlist is a
>     comma-separated list of gids
>     Feb 15 10:48:06 odin2 heartbeat: [3883]: info: One or the other
>     must be
>     specified.
>     Feb 15 10:48:06 odin2 heartbeat: [3883]: info: Syntax: apiauth client
>     [uid=uidlist] [gid=gidlist]
>     Feb 15 10:48:06 odin2 heartbeat: [3883]: info: Where uidlist is a
>     comma-separated list of uids,
>     Feb 15 10:48:06 odin2 heartbeat: [3883]: info: and gidlist is a
>     comma-separated list of gids
>     Feb 15 10:48:06 odin2 heartbeat: [3883]: info: One or the other
>     must be
>     specified.
>     Feb 15 10:48:06 odin2 heartbeat: [3883]: info: AUTH: i=1: key =
>     0x80eed98, auth=0xb7b6cccc, authname=crc
>     Feb 15 10:48:06 odin2 heartbeat: [3883]: WARN: Core dumps could be
>     lost
>     if multiple dumps occur
>     Feb 15 10:48:06 odin2 heartbeat: [3883]: WARN: Consider setting
>     /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
>     supportability
>     Feb 15 10:48:06 odin2 heartbeat: [3883]: WARN: logd is enabled but
>     logfile/debugfile/logfacility is still configured in ha.cf
>     <http://ha.cf>
>     Feb 15 10:48:06 odin2 heartbeat: [3883]: info:
>     **************************
>     Feb 15 10:48:06 odin2 heartbeat: [3883]: info: Configuration
>     validated.
>     Starting heartbeat 2.0.8
>     Feb 15 10:48:06 odin2 heartbeat: [3884]: info: heartbeat: version
>     2.0.8
>     Feb 15 10:48:06 odin2 heartbeat: [3884]: info: Heartbeat
>     generation: 17
>     Feb 15 10:48:06 odin2 heartbeat: [3884]: info:
>     G_main_add_TriggerHandler: Added signal manual handler
>     Feb 15 10:48:06 odin2 heartbeat: [3884]: info:
>     G_main_add_TriggerHandler: Added signal manual handler
>     Feb 15 10:48:06 odin2 heartbeat: [3884]: info: Removing
>     /var/run/heartbeat/rsctmp failed, recreating.
>     Feb 15 10:48:06 odin2 heartbeat: [3884]: info: glib: ucast: write
>     socket
>     priority set to IPTOS_LOWDELAY on eth1
>     Feb 15 10:48:06 odin2 heartbeat: [3884]: info: glib: ucast: bound
>     send
>     socket to device: eth1
>     Feb 15 10:48:06 odin2 heartbeat: [3884]: info: glib: ucast: bound
>     receive socket to device: eth1
>     Feb 15 10:48:06 odin2 heartbeat: [3884]: info: glib: ucast: started on
>     port 694 interface eth1 to 10.100.100.1 <http://10.100.100.1>
>     Feb 15 10:48:06 odin2 heartbeat: [3884]: info:
>     G_main_add_SignalHandler:
>     Added signal handler for signal 17
>     Feb 15 10:48:06 odin2 heartbeat: [3884]: info: Local status now
>     set to: 'up'
>     Feb 15 10:48:07 odin2 heartbeat: [3884]: info: Link odin1:eth1 up.
>     Feb 15 10:48:07 odin2 heartbeat: [3884]: info: Status update for node
>     odin1: status active
>     Feb 15 10:48:07 odin2 harc[3896]: [3899]: info: Running
>     /etc/ha.d/rc.d/status status
>     Feb 15 10:48:07 odin2 heartbeat: [3884]: info: Exiting status process
>     3896 returned rc 0.
>     Feb 15 10:48:08 odin2 heartbeat: [3884]: info: Comm_now_up(): updating
>     status to active
>     Feb 15 10:48:08 odin2 heartbeat: [3884]: info: Local status now
>     set to:
>     'active'
>
>
>     Fica nessa, mas eu vejo pelo heartbeat status que está tudo parado.
>     No nodo 1, que é o primário, ele detecta que o nodo 2 caiu:
>
>     Feb 15 10:48:07 odin1 ipfail: [6343]: info: Link Status update: Link
>     odin2/eth1 now has status up
>     Feb 15 10:48:07 odin1 heartbeat: [6330]: info: Heartbeat restart
>     on node
>     odin2
>     Feb 15 10:48:07 odin1 heartbeat: [6330]: info: Link odin2:eth1 up.
>     Feb 15 10:48:07 odin1 heartbeat: [6330]: info: Status update for node
>     odin2: status init
>     Feb 15 10:48:07 odin1 ipfail: [6343]: info: Status update: Node odin2
>     now has status init
>     Feb 15 10:48:07 odin1 heartbeat: [6330]: info: Status update for node
>     odin2: status up
>     Feb 15 10:48:07 odin1 ipfail: [6343]: info: Status update: Node odin2
>     now has status up
>     Feb 15 10:48:07 odin1 harc[6745]: [6747]: info: Running
>     /etc/ha.d/rc.d/status status
>     Feb 15 10:48:07 odin1 harc[6749]: [6751]: info: Running
>     /etc/ha.d/rc.d/status status
>     Feb 15 10:48:08 odin1 heartbeat: [6330]: info: all clients are now
>     paused
>     Feb 15 10:48:08 odin1 heartbeat: [6330]: info: all clients are now
>     resumed
>     Feb 15 10:48:08 odin1 heartbeat: [6330]: info: Status update for node
>     odin2: status active
>     Feb 15 10:48:08 odin1 ipfail: [6343]: info: Status update: Node odin2
>     now has status active
>     Feb 15 10:48:08 odin1 harc[6753]: [6755]: info: Running
>     /etc/ha.d/rc.d/status status
>     Feb 15 10:48:41 odin1 ipfail: [6343]: info: Status update: Node odin2
>     now has status dead
>     Feb 15 10:48:41 odin1 heartbeat: [6330]: WARN: node odin2: is dead
>     Feb 15 10:48:41 odin1 heartbeat: [6330]: info: Dead node odin2 gave up
>     resources.
>     Feb 15 10:48:41 odin1 heartbeat: [6330]: info: Link odin2:eth1 dead.
>     Feb 15 10:48:41 odin1 ipfail: [6343]: info: NS: We are dead. :<
>     Feb 15 10:48:42 odin1 ipfail: [6343]: info: Link Status update: Link
>     odin2/eth1 now has status dead
>     Feb 15 10:48:42 odin1 ipfail: [6343]: info: We are dead. :<
>     Feb 15 10:48:42 odin1 ipfail: [6343]: info: Asking other side for ping
>     node count.
>
>
>     Meu haresources em ambas as máquinas contém:
>
>     odin1 IPaddr::X.X.X.X/26/eth0 drbddisk::r0
>     Filesystem::/dev/drbd0::/zeo::ext3 zeo-instancias
>
>
>     Se alguém puder dar uma dica de como resolver esse problema, agradeço.
>
>     --
>     Jeronimo Zucco
>     LPIC-1 Linux Professional Institute Certified
>     Núcleo de Processamento de Dados
>     Universidade de Caxias do Sul
>
>     http://jczucco.blogspot.com
>
>     _______________________________________________
>     Linux-HA mailing list
>     [email protected] <mailto:[email protected]>
>     http://listas.linuxchix.org.br/mailman/listinfo/linux-ha
>
>
>
>
> -- 
> Ronaldo Amaral Santos
> Tecnólogo em Desenvolvimento de Software 6º Período Noturno
> Núcleo de Pesquisa em Sistemas de Informação – NSI
> Cefet-Campos
> -------------------------
> Linux User #437600
> ------------------------------------------------------------------------
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://listas.linuxchix.org.br/mailman/listinfo/linux-ha
>   


-- 
Jeronimo Zucco
LPIC-1 Linux Professional Institute Certified
Núcleo de Processamento de Dados
Universidade de Caxias do Sul

http://jczucco.blogspot.com

_______________________________________________
Linux-HA mailing list
[email protected]
http://listas.linuxchix.org.br/mailman/listinfo/linux-ha

Re: [Linux-HA] Heartbeat no nodo 2 cai

Responder a