Re: [Linux-HA] Keepalived derrubando o Primary

Flavio Menezes Reis Mon, 15 May 2006 20:07:02 -0700

Bom, achei a causa de o keepalived retornar 1... É porque o script em
/etc/init.d/keepalived é iniciado pela linha:


start-stop-daemon --start --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON

É o star-stop-daemon que retorna 1 se o keepalived já está iniciado,
então complementei esta linha com "--oknodo" que inibe o retorno de 1
quando o keepalived já está iniciado.

start-stop-daemon --oknodo --start --quiet --pidfile
/var/run/$NAME.pid --exec $DAEMON

Com este "remendo" tudo voltou ao normal, pelo menos por enquanto.

Até logo

2006/5/15, Flavio Menezes Reis <[EMAIL PROTECTED]>:
> Realmente, o problema eh que o keepalived retorna 1 quando é iniciado
> sem ser parado antes, ou seja, no segundo start consecutivo ele começa
> a retornar 1... Mas porque o heartbeat está iniciando os serviços
> novamente se eles já estão iniciados?
>
> Obrigado
>
> 2006/5/15, Flavio Menezes Reis <[EMAIL PROTECTED]>:
> > Pra embassar mais... o log do que acontece quando eu desligo o
> > secundário. Parece que o problema está em o hearbeat novamente dar
> > start em tudo que está em haresources quando o secundário é desligado:
> > heartbeat[6047]: 2006/05/15_11:30:45 info: Received shutdown notice
> > from 'sempron2200_2'.
> > heartbeat[6047]: 2006/05/15_11:30:45 info: Resources being acquired
> > from sempron2200_2.
> > heartbeat[7102]: 2006/05/15_11:30:45 info: acquire local HA resources 
> > (standby).
> > ResourceManager[7122]:  2006/05/15_11:30:45 info: Acquiring resource
> > group: athlon2400 IPaddr::192.168.1.50/24/eth0 drbddisk
> > Filesystem::/dev/drbd0::/mnt/data postgresql-8.1 cluster0 keepalived
> > IPaddr[7147]:   2006/05/15_11:30:45 INFO: IPaddr Running OK
> > heartbeat[7103]: 2006/05/15_11:30:45 info: Local Resource acquisition 
> > completed.
> > IPaddr[7166]:   2006/05/15_11:30:45 INFO: IPaddr Running OK
> > ResourceManager[7122]:  2006/05/15_11:30:45 info: Running
> > /etc/ha.d/resource.d/drbddisk  start
> > Filesystem[7485]:       2006/05/15_11:30:45 INFO: /mnt/data is mounted 
> > (running)
> > Filesystem[7421]:       2006/05/15_11:30:45 INFO: Filesystem Running OK
> > ResourceManager[7122]:  2006/05/15_11:30:45 info: Running
> > /etc/init.d/postgresql-8.1  start
> > ResourceManager[7122]:  2006/05/15_11:30:46 info: Running
> > /etc/ha.d/resource.d/cluster0  start
> > ResourceManager[7122]:  2006/05/15_11:30:47 info: Running
> > /etc/init.d/keepalived  start
> > ResourceManager[7122]:  2006/05/15_11:30:47 ERROR: Return code 1 from
> > /etc/init.d/keepalived
> > ResourceManager[7122]:  2006/05/15_11:30:47 CRIT: Giving up resources
> > due to failure of keepalived
> > ResourceManager[7122]:  2006/05/15_11:30:47 info: Releasing resource
> > group: athlon2400 IPaddr::192.168.1.50/24/eth0 drbddisk
> > Filesystem::/dev/drbd0::/mnt/data postgresql-8.1 cluster0 keepalived
> > ResourceManager[7122]:  2006/05/15_11:30:47 info: Running
> > /etc/init.d/keepalived  stop
> > ResourceManager[7122]:  2006/05/15_11:30:47 info: Running
> > /etc/ha.d/resource.d/cluster0  stop
> > ResourceManager[7122]:  2006/05/15_11:30:49 info: Running
> > /etc/init.d/postgresql-8.1  stop
> > ResourceManager[7122]:  2006/05/15_11:30:51 info: Running
> > /etc/ha.d/resource.d/Filesystem /dev/drbd0 /mnt/data stop
> > Filesystem[7755]:       2006/05/15_11:30:52 INFO: Filesystem Success
> > ResourceManager[7122]:  2006/05/15_11:30:52 info: Running
> > /etc/ha.d/resource.d/drbddisk  stop
> > ResourceManager[7122]:  2006/05/15_11:30:53 info: Running
> > /etc/ha.d/resource.d/IPaddr 192.168.1.50/24/eth0 stop
> > IPaddr[7988]:   2006/05/15_11:30:54 INFO: /sbin/route -n del -host 
> > 192.168.1.50
> > IPaddr[7988]:   2006/05/15_11:30:54 INFO: /sbin/ifconfig eth0:0
> > 192.168.1.50 down
> > IPaddr[7988]:   2006/05/15_11:30:54 INFO: IP Address 192.168.1.50 released
> > IPaddr[7883]:   2006/05/15_11:30:54 INFO: IPaddr Success
> > heartbeat[7102]: 2006/05/15_11:30:54 info: local HA resource
> > acquisition completed (standby).
> > heartbeat[6047]: 2006/05/15_11:30:54 info: Standby resource
> > acquisition done [foreign].
> > harc[8041]:     2006/05/15_11:30:54 info: Running /etc/ha.d/rc.d/status 
> > status
> > mach_down[8050]:        2006/05/15_11:30:54 info:
> > /usr/lib/heartbeat/mach_down: nice_failback: foreign resources
> > acquired
> > mach_down[8050]:        2006/05/15_11:30:54 info: mach_down takeover
> > complete for node sempron2200_2.
> > heartbeat[6047]: 2006/05/15_11:30:54 info: mach_down takeover complete.
> >
> >
> > 2006/5/15, Flavio Menezes Reis <[EMAIL PROTECTED]>:
> > > Olá!
> > >
> > > Tava quase tudo pronto aqui no cluster, com auto failback on e tudo, o
> > > problema é que não havia experimentado desligar o secundário e
> > > incrivelmente, quando faço isto, o keepalived vem com um Return Code =
> > > 1 (RC=1) que faz com que o heartbeat comece a dar stop em todos os
> > > script do arquivo haresource o que acaba por derrubar o servidor...
> > > Porque o keepalived retorna este RC=1???
> > >
> > > Obrigado a todos. Daqui a pouco vou remeter um relato completo de como
> > > construi este cluster para trocar experiências com vocês.
> > >
> > > Abraços
> > >
> > > --
> > > Flávio Menezes dos Reis
> > > Bacharelando em Sistemas de Informação - Ulbra - Torres - RS
> > > [EMAIL PROTECTED]
> > >
> >
> >
> > --
> > Flávio Menezes dos Reis
> > Bacharelando em Sistemas de Informação - Ulbra - Torres - RS
> > [EMAIL PROTECTED]
> >
>
>
> --
> Flávio Menezes dos Reis
> Bacharelando em Sistemas de Informação - Ulbra - Torres - RS
> [EMAIL PROTECTED]
>


-- 
Flávio Menezes dos Reis
Bacharelando em Sistemas de Informação - Ulbra - Torres - RS
[EMAIL PROTECTED]
_______________________________________________
Linux-HA mailing list
[email protected]
http://listas.linuxchix.org.br/mailman/listinfo/linux-ha

Re: [Linux-HA] Keepalived derrubando o Primary

Responder a