[Linux-HA] CRM hung, node wedged, but heartbeats still being sent

Tavanyar, Simon Wed, 31 Mar 2010 11:50:40 -0700

I'm running 2.1.4 (please don't shoot).


A disk error managed to grind everything to a halt on my primary node.
No software accessing disk was able to run. 

 

Nothing is being written to ha.log.  Resources are no longer responding
to monitors, but the CRM is hung too, so it won't notice.

 

HOWEVER, we are still happily sending heartbeats  - so the other node
never takes over.

 

We have a dead node in every way except one: it keeps telling the other
heartbeat, "I'm alive"!

 

1)      Has anyone else seen this type of failure?

2)      What ensures that a heartbeat will not be sent if CRM is
hung/wedged?

 

 

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] CRM hung, node wedged, but heartbeats still being sent

Reply via email to