Hi Folks,
Dual-node, pacemaker cluster, DRBD-backed xen virtual machines - one of
our VMs will run on one node, but not the other, and "crm status" yields
a failure message saying that starting the resource failed for unknown
reasons. The log is only slightly less useless:
(server2 and server3 are the nodes, server1 is the resource)
<server3, running server1, crashes>
<node entries from server2 trying to failover the resource>
Jul 27 06:27:06 server2 pengine: [1365]: info: get_failcount: server1
has failed INFINITY times on server2
Jul 27 06:27:06 server2 pengine: [1365]: WARN: common_apply_stickiness:
Forcing server1 away from server2 after 1000000 failures (max=1000000)
Jul 27 06:27:06 server2 pengine: [1365]: info: native_color: Resource
server1 cannot run anywhere
Jul 27 06:27:06 server2 pengine: [1365]: notice: LogActions: Leave
resource server1#011(Stopped)
Attempts to migrate the server fail with the same errors. Failover USED
to work just fine. It still works for other VMs. Any idea how to track
down what's failing?
Thanks very much,
Miles Fidelman
--
In theory, there is no difference between theory and practice.
In practice, there is. .... Yogi Berra
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems