Hi Alan

Thankyou for the input, but that is not it: I have double checked the
config. This log is from the slave which had it's network connection
disconneted. It says that nw is daed which seems correct to me, since
NIC is disconnected? (nw i the router and 2 DNS servers on the local
network)

It seems the log got mangled in the mail, maybe you overlooked the
[RECONNECT] line. I'll try to include it again.

Morten

[DISCONNECT]
Nov  7 10:11:44 localhost heartbeat[4421]: WARN: node nw: is dead
Nov  7 10:11:44 localhost heartbeat[4421]: info: Link nw:nw dead.
Nov  7 10:11:44 localhost ipfail[4431]: info: Status update: Node nw now
has status dead
Nov  7 10:11:44 localhost heartbeat[4631]: debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
Nov  7 10:11:44 localhost ipfail[4431]: info: NS: We are dead. :<
Nov  7 10:11:44 localhost ipfail[4431]: info: Link Status update: Link
nw/nw now has status dead
Nov  7 10:11:44 localhost ipfail[4431]: info: We are dead. :<
Nov  7 10:11:44 localhost ipfail[4431]: info: Asking other side for ping
node count.
Nov  7 10:11:44 localhost ipfail[4431]: debug: Message [num_ping] sent.
Nov  7 10:11:44 localhost heartbeat: info: Running /etc/ha.d/rc.d/status
status

[RECONNECT]
Nov  7 10:12:07 localhost heartbeat[4421]: info: Link nw:nw up.
Nov  7 10:12:07 localhost heartbeat[4421]: WARN: Late heartbeat: Node
nw: interval 35020 ms
Nov  7 10:12:07 localhost heartbeat[4421]: info: Status update for node
nw: status ping
Nov  7 10:12:07 localhost ipfail[4431]: info: Link Status update: Link
nw/nw now has status up
Nov  7 10:12:07 localhost ipfail[4431]: info: Status update: Node nw now
has status ping
Nov  7 10:12:07 localhost ipfail[4431]: info: A ping node just came up.
Nov  7 10:12:07 localhost ipfail[4431]: debug: Found ping node nw!
Nov  7 10:12:07 localhost ipfail[4431]: info: Asking other side for ping
node count.
Nov  7 10:12:07 localhost ipfail[4431]: debug: Message [num_ping] sent.

> -----Original Message-----
> From: Alan Robertson [mailto:[EMAIL PROTECTED] 
> Sent: 8. november 2007 16:14
> To: Morten Laursen
> Cc: [email protected]
> Subject: Re: Missing gratious ARP
> 
> Morten Laursen wrote:
> >> On 1.2.3, that "returning after partition" should cause 
> both sides to 
> >> shut down all resources and restart them.  Restarting them should 
> >> issue more gratuitous ARPs.
> >>
> >> Do both servers get the "returning after partition" message?
> > 
> > No, the slave does not get the message, and it does not 
> restart. Here is the entire log from the slave:
> > 
> > [DISCONNECT]
> > Nov  7 10:11:44 localhost heartbeat[4421]: WARN: node nw: 
> is dead Nov  
> > 7 10:11:44 localhost heartbeat[4421]: info: Link nw:nw dead.
> > Nov  7 10:11:44 localhost ipfail[4431]: info: Status 
> update: Node nw 
> > now has sta tus dead Nov  7 10:11:44 localhost 
> heartbeat[4631]: debug: 
> > notify_world: setting SIGCHLD Handler to SIG_DFL Nov  7 10:11:44 
> > localhost ipfail[4431]: info: NS: We are dead. :< Nov  7 10:11:44 
> > localhost ipfail[4431]: info: Link Status update: Link 
> nw/nw now  has 
> > status dead Nov  7 10:11:44 localhost ipfail[4431]: info: 
> We are dead. 
> > :< Nov  7 10:11:44 localhost ipfail[4431]: info: Asking 
> other side for 
> > ping node co unt.
> > Nov  7 10:11:44 localhost ipfail[4431]: debug: Message 
> [num_ping] sent.
> > Nov  7 10:11:44 localhost heartbeat: info: Running 
> > /etc/ha.d/rc.d/status status 
[RECONNECT] 
Nov  7 10:12:07 localhost 
> > heartbeat[4421]: info: Link nw:nw up.

> > Nov  7 10:12:07 localhost heartbeat[4421]: WARN: Late 
> heartbeat: Node 
> > nw: interv al 35020 ms Nov  7 10:12:07 localhost heartbeat[4421]: 
> > info: Status update for node nw: stat us ping Nov  7 10:12:07 
> > localhost ipfail[4431]: info: Link Status update: Link 
> nw/nw now  has 
> > status up Nov  7 10:12:07 localhost ipfail[4431]: info: 
> Status update: 
> > Node nw now has sta tus ping Nov  7 10:12:07 localhost 
> ipfail[4431]: 
> > info: A ping node just came up.
> > Nov  7 10:12:07 localhost ipfail[4431]: debug: Found ping node nw!
> > Nov  7 10:12:07 localhost ipfail[4431]: info: Asking other side for 
> > ping node co unt.
> > Nov  7 10:12:07 localhost ipfail[4431]: debug: Message 
> [num_ping] sent.
> > 
> 
> These messages indicate to me that you have a screwed up 
> configuration.
> 
> I would guess that you have 'nw' as both a node in your 
> cluster AND as a ping node.
> 
> Never ping anything inside your cluster.
> 
> 
> -- 
>      Alan Robertson <[EMAIL PROTECTED]>
> 
> "Openness is the foundation and preservative of friendship... 
>  Let me claim from you at all times your undisguised 
> opinions." - William Wilberforce
> 
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to