Hi, I encountered a problem when performing a live migration of some OpenVZ CTs. Altough the migration didn't trigger any messages in 'crm_mon' and was initially performed without any troubles, the resource was restarted on the target node 'unnecessarily'. From the logs it looks as if after the actual migration pacemaker detected the resource to be running on both nodes. Why did it detect that? Could it be that it checked too early on the source node? Might that be a problem with the RA ManageVE returing too early?
(For details see below) Roman The setup: * Nodes are running Debian Squeeze with the current pve kernel * Our CTs are running on an NFS share mounted on both nodes * pacemaker 1.1.7 Action: Migration of resource 'netpd' from vice1 to vice0 Log of the source node (vice1) ------------------------------ Mar 20 16:30:57 vice1 ManageVE[107511]: INFO: Setting up checkpoint... suspend... dump... kill... Container is unmounted Checkpointing completed succesfully Mar 20 16:30:57 vice1 lrmd: [1523]: info: operation migrate_to[66] on netpd for client 1526: pid 107511 exited with return code 0 Mar 20 16:30:57 vice1 crmd: [1526]: info: process_lrm_event: LRM operation netpd_migrate_to_0 (call=66, rc=0, cib-update=172, confirmed=true) ok [...] Mar 20 16:30:57 vice1 pengine: [1525]: ERROR: native_create_actions: Resource netpd (ocf::ManageVE) is active on 2 nodes attempting recovery Mar 20 16:30:57 vice1 pengine: [1525]: WARN: native_create_actions: See http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information. [...] Mar 20 16:30:57 vice1 pengine: [1525]: notice: LogActions: Restart netpd#011(Started vice0) Log of the target node (vice0) ------------------------------- Mar 20 16:30:57 vice0 lrmd: [1543]: info: rsc:netpd stop[23] (pid 2191) [...] Mar 20 16:30:57 vice0 ManageVE[2191]: INFO: VE 3025 already stopped. [...] Mar 20 16:30:57 vice0 lrmd: [1543]: info: operation stop[23] on netpd for client 1546: pid 2191 exited with return code 0 [...] Mar 20 16:30:57 vice0 crmd: [1546]: info: process_lrm_event: LRM operation netpd_stop_0 (call=23, rc=0, cib-update=28, confirmed=true) ok [...] Mar 20 16:30:57 vice0 lrmd: [1543]: info: rsc:netpd start[27] (pid 2275) [...] Mar 20 16:30:57 vice0 kernel: CT: 3025: started _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
