Hi Folks,

Dual-node, pacemaker cluster, DRBD-backed xen virtual machines - one of our VMs will run on one node, but not the other, and "crm status" yields a failure message saying that starting the resource failed for unknown reasons. The log is only slightly less useless:

(server2 and server3 are the nodes, server1 is the resource)
<server3, running server1, crashes>
<node entries from server2 trying to failover the resource>

Jul 27 06:27:06 server2 pengine: [1365]: info: get_failcount: server1 has failed INFINITY times on server2 Jul 27 06:27:06 server2 pengine: [1365]: WARN: common_apply_stickiness: Forcing server1 away from server2 after 1000000 failures (max=1000000) Jul 27 06:27:06 server2 pengine: [1365]: info: native_color: Resource server1 cannot run anywhere Jul 27 06:27:06 server2 pengine: [1365]: notice: LogActions: Leave resource server1#011(Stopped)

Attempts to migrate the server fail with the same errors. Failover USED to work just fine. It still works for other VMs. Any idea how to track down what's failing?

Thanks very much,

Miles Fidelman


--
In theory, there is no difference between theory and practice.
In practice, there is.   .... Yogi Berra

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to