On 26 Jun 2012, at 22:18, Andreas Kurz wrote: > use STONITH to prevent resources running on both nodes ... you > configured redundant cluster communication paths?
The nodes in question are Linode VMs, so not much opportunity for that. > With heartbeat you can use the "cl_status" command with its various > options to check Heartbeats view of the cluster .... and heartbeats log > messages from the split-brain event should also give you some hints. cl_status just confirms that each node thinks the other is dead. ok, I see two things happening in the logs: At one point proxy2 reported a slow heartbeat (20sec, deadtime was set to 15) but seemed to reconnect. Later on, both nodes reported each other as dead within the same second: Jun 25 10:14:16 proxy1 heartbeat: [2678]: WARN: node proxy2.example.com: is dead Jun 25 10:14:16 proxy1 heartbeat: [2678]: info: Link proxy2.example.com:eth0 dead. Jun 25 10:14:16 proxy1 crmd: [3205]: notice: crmd_ha_status_callback: Status update: Node proxy2.example.com now has status [dead] As I understand it, STONITH is intended to prevent a node rejoining in case it causes more trouble. In this case the individual nodes were fine, it appeared to be the network that was at fault. Why wouldn't these nodes automatically reconnect, given that there is no STONITH to prevent them? How should I tell them to reconnect manually? I can also see that it failed to send alerts from the email resources at the same time because DNS lookups were failing: all points to a wider network issue. I wonder if Linode has micro-outages on their network since we've also been seeing some problems with mmm reporting 'network unreachable' on some other instances at the same time. Marcus -- Marcus Bointon Synchromedia Limited: Creators of http://www.smartmessages.net/ UK info@hand CRM solutions [email protected] | http://www.synchromedia.co.uk/ _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
