Jeronimo,

On Thu, Feb 15, 2007 at 10:52:02AM -0200, Jeronimo Zucco wrote:
|     Olá a todos.
| 
|     Estou enfrentando problema com o heartbeat. O nodo 2 simplesmente 
| cai depois de alguns segundos, sem nenhuma explicação aparente. Segue 
| abaixo o log do nodo 2:

O log abaixo indica que tens um erro de configuração (talvez dois) nas
linhas referentes ao iten "apiauth". Conserta isso e verifica se o erro
continua aparecendo.

| Feb 15 10:48:06 odin2 heartbeat: [3883]: info: Enabling logging daemon
| Feb 15 10:48:06 odin2 heartbeat: [3883]: info: logfile and debug file 
| are those specified in logd config file (default /etc/logd.cf)
| Feb 15 10:48:06 odin2 heartbeat: [3883]: info: Syntax: apiauth client 
| [uid=uidlist] [gid=gidlist]
| Feb 15 10:48:06 odin2 heartbeat: [3883]: info: Where uidlist is a 
| comma-separated list of uids,
| Feb 15 10:48:06 odin2 heartbeat: [3883]: info: and gidlist is a 
| comma-separated list of gids
| Feb 15 10:48:06 odin2 heartbeat: [3883]: info: One or the other must be 
| specified.
| Feb 15 10:48:06 odin2 heartbeat: [3883]: info: Syntax: apiauth client 
| [uid=uidlist] [gid=gidlist]
| Feb 15 10:48:06 odin2 heartbeat: [3883]: info: Where uidlist is a 
| comma-separated list of uids,
| Feb 15 10:48:06 odin2 heartbeat: [3883]: info: and gidlist is a 
| comma-separated list of gids
| Feb 15 10:48:06 odin2 heartbeat: [3883]: info: One or the other must be 
| specified.
| Feb 15 10:48:06 odin2 heartbeat: [3883]: info: AUTH: i=1: key = 
| 0x80eed98, auth=0xb7b6cccc, authname=crc
| Feb 15 10:48:06 odin2 heartbeat: [3883]: WARN: Core dumps could be lost 
| if multiple dumps occur
| Feb 15 10:48:06 odin2 heartbeat: [3883]: WARN: Consider setting 
| /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum 
| supportability
| Feb 15 10:48:06 odin2 heartbeat: [3883]: WARN: logd is enabled but 
| logfile/debugfile/logfacility is still configured in ha.cf
| Feb 15 10:48:06 odin2 heartbeat: [3883]: info: **************************
| Feb 15 10:48:06 odin2 heartbeat: [3883]: info: Configuration validated. 
| Starting heartbeat 2.0.8
| Feb 15 10:48:06 odin2 heartbeat: [3884]: info: heartbeat: version 2.0.8
| Feb 15 10:48:06 odin2 heartbeat: [3884]: info: Heartbeat generation: 17
| Feb 15 10:48:06 odin2 heartbeat: [3884]: info: 
| G_main_add_TriggerHandler: Added signal manual handler
| Feb 15 10:48:06 odin2 heartbeat: [3884]: info: 
| G_main_add_TriggerHandler: Added signal manual handler
| Feb 15 10:48:06 odin2 heartbeat: [3884]: info: Removing 
| /var/run/heartbeat/rsctmp failed, recreating.
| Feb 15 10:48:06 odin2 heartbeat: [3884]: info: glib: ucast: write socket 
| priority set to IPTOS_LOWDELAY on eth1
| Feb 15 10:48:06 odin2 heartbeat: [3884]: info: glib: ucast: bound send 
| socket to device: eth1
| Feb 15 10:48:06 odin2 heartbeat: [3884]: info: glib: ucast: bound 
| receive socket to device: eth1
| Feb 15 10:48:06 odin2 heartbeat: [3884]: info: glib: ucast: started on 
| port 694 interface eth1 to 10.100.100.1
| Feb 15 10:48:06 odin2 heartbeat: [3884]: info: G_main_add_SignalHandler: 
| Added signal handler for signal 17
| Feb 15 10:48:06 odin2 heartbeat: [3884]: info: Local status now set to: 'up'
| Feb 15 10:48:07 odin2 heartbeat: [3884]: info: Link odin1:eth1 up.
| Feb 15 10:48:07 odin2 heartbeat: [3884]: info: Status update for node 
| odin1: status active
| Feb 15 10:48:07 odin2 harc[3896]: [3899]: info: Running 
| /etc/ha.d/rc.d/status status
| Feb 15 10:48:07 odin2 heartbeat: [3884]: info: Exiting status process 
| 3896 returned rc 0.
| Feb 15 10:48:08 odin2 heartbeat: [3884]: info: Comm_now_up(): updating 
| status to active
| Feb 15 10:48:08 odin2 heartbeat: [3884]: info: Local status now set to: 
| 'active'
| 
| 
| Fica nessa, mas eu vejo pelo heartbeat status que está tudo parado.
| No nodo 1, que é o primário, ele detecta que o nodo 2 caiu:
| 
| Feb 15 10:48:07 odin1 ipfail: [6343]: info: Link Status update: Link 
| odin2/eth1 now has status up
| Feb 15 10:48:07 odin1 heartbeat: [6330]: info: Heartbeat restart on node 
| odin2
| Feb 15 10:48:07 odin1 heartbeat: [6330]: info: Link odin2:eth1 up.
| Feb 15 10:48:07 odin1 heartbeat: [6330]: info: Status update for node 
| odin2: status init
| Feb 15 10:48:07 odin1 ipfail: [6343]: info: Status update: Node odin2 
| now has status init
| Feb 15 10:48:07 odin1 heartbeat: [6330]: info: Status update for node 
| odin2: status up
| Feb 15 10:48:07 odin1 ipfail: [6343]: info: Status update: Node odin2 
| now has status up
| Feb 15 10:48:07 odin1 harc[6745]: [6747]: info: Running 
| /etc/ha.d/rc.d/status status
| Feb 15 10:48:07 odin1 harc[6749]: [6751]: info: Running 
| /etc/ha.d/rc.d/status status
| Feb 15 10:48:08 odin1 heartbeat: [6330]: info: all clients are now paused
| Feb 15 10:48:08 odin1 heartbeat: [6330]: info: all clients are now resumed
| Feb 15 10:48:08 odin1 heartbeat: [6330]: info: Status update for node 
| odin2: status active
| Feb 15 10:48:08 odin1 ipfail: [6343]: info: Status update: Node odin2 
| now has status active
| Feb 15 10:48:08 odin1 harc[6753]: [6755]: info: Running 
| /etc/ha.d/rc.d/status status
| Feb 15 10:48:41 odin1 ipfail: [6343]: info: Status update: Node odin2 
| now has status dead
| Feb 15 10:48:41 odin1 heartbeat: [6330]: WARN: node odin2: is dead
| Feb 15 10:48:41 odin1 heartbeat: [6330]: info: Dead node odin2 gave up 
| resources.
| Feb 15 10:48:41 odin1 heartbeat: [6330]: info: Link odin2:eth1 dead.
| Feb 15 10:48:41 odin1 ipfail: [6343]: info: NS: We are dead. :<
| Feb 15 10:48:42 odin1 ipfail: [6343]: info: Link Status update: Link 
| odin2/eth1 now has status dead
| Feb 15 10:48:42 odin1 ipfail: [6343]: info: We are dead. :<
| Feb 15 10:48:42 odin1 ipfail: [6343]: info: Asking other side for ping 
| node count.
| 
| 
| Meu haresources em ambas as máquinas contém:
| 
| odin1 IPaddr::X.X.X.X/26/eth0 drbddisk::r0 
| Filesystem::/dev/drbd0::/zeo::ext3 zeo-instancias
| 
| 
| Se alguém puder dar uma dica de como resolver esse problema, agradeço.
| 
| -- 
| Jeronimo Zucco
| LPIC-1 Linux Professional Institute Certified
| Núcleo de Processamento de Dados
| Universidade de Caxias do Sul
| 
| http://jczucco.blogspot.com
| 
| _______________________________________________
| Linux-HA mailing list
| [email protected]
| http://listas.linuxchix.org.br/mailman/listinfo/linux-ha
| 
| E-mail classificado pelo Identificador de Spam Inteligente Terra.
| Para alterar a categoria classificada, visite
| 
http://mail.terra.com.br/protected_email/imail/imail.cgi?+_u=lc_poa&_l=1,1171556058.621591.2214.arrino.hst.terra.com.br,8846,Des15,Des15
| 
| Esta mensagem foi verificada pelo E-mail Protegido Terra.
| Scan engine: McAfee VirusScan / Atualizado em 14/02/2007 / Versão: 5.1.00/4963
| Proteja o seu e-mail Terra: http://mail.terra.com.br/
| 
---end quoted text---

-- 
[ Luis Claudio R. Goncalves                    lclaudio at unix dot sh ]
[ Fingerprint:   4FDD B8C4 3C59 34BD 8BE9  2696 7203 D980 A448 C8F8    ]
[ Linux-HA Developer - LateNite Programmer - Gospel User - Bass Player ]
[ Fault Tolerance - Real-Time - Distributed Systems - IECLB - Is 40:31 ]

_______________________________________________
Linux-HA mailing list
[email protected]
http://listas.linuxchix.org.br/mailman/listinfo/linux-ha

Responder a