On Fri, Mar 22, 2013 at 7:13 PM, Roman Haefeli <[email protected]> wrote: > Hi, > > I encountered a problem when performing a live migration of some OpenVZ > CTs. Altough the migration didn't trigger any messages in 'crm_mon' and > was initially performed without any troubles, the resource was restarted > on the target node 'unnecessarily'. From the logs it looks as if after > the actual migration pacemaker detected the resource to be running on > both nodes. Why did it detect that?
Probably the PE couldn't match up the partially completed migration. I would bet a pacemaker upgrade prevents this form happening again. > Could it be that it checked too > early on the source node? Might that be a problem with the RA ManageVE > returing too early? > > (For details see below) > > Roman > > > > > The setup: > * Nodes are running Debian Squeeze with the current pve kernel > * Our CTs are running on an NFS share mounted on both nodes > * pacemaker 1.1.7 > > Action: > Migration of resource 'netpd' from vice1 to vice0 > > Log of the source node (vice1) > ------------------------------ > Mar 20 16:30:57 vice1 ManageVE[107511]: INFO: Setting up checkpoint... > suspend... dump... kill... Container is unmounted Checkpointing completed > succesfully > Mar 20 16:30:57 vice1 lrmd: [1523]: info: operation migrate_to[66] on netpd > for client 1526: pid 107511 exited with return code 0 > Mar 20 16:30:57 vice1 crmd: [1526]: info: process_lrm_event: LRM operation > netpd_migrate_to_0 (call=66, rc=0, cib-update=172, confirmed=true) ok > [...] > Mar 20 16:30:57 vice1 pengine: [1525]: ERROR: native_create_actions: Resource > netpd (ocf::ManageVE) is active on 2 nodes attempting recovery > Mar 20 16:30:57 vice1 pengine: [1525]: WARN: native_create_actions: See > http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information. > [...] > Mar 20 16:30:57 vice1 pengine: [1525]: notice: LogActions: Restart > netpd#011(Started vice0) > > > Log of the target node (vice0) > ------------------------------- > Mar 20 16:30:57 vice0 lrmd: [1543]: info: rsc:netpd stop[23] (pid 2191) > [...] > Mar 20 16:30:57 vice0 ManageVE[2191]: INFO: VE 3025 already stopped. > [...] > Mar 20 16:30:57 vice0 lrmd: [1543]: info: operation stop[23] on netpd for > client 1546: pid 2191 exited with return code 0 > [...] > Mar 20 16:30:57 vice0 crmd: [1546]: info: process_lrm_event: LRM operation > netpd_stop_0 (call=23, rc=0, cib-update=28, confirmed=true) ok > [...] > Mar 20 16:30:57 vice0 lrmd: [1543]: info: rsc:netpd start[27] (pid 2275) > [...] > Mar 20 16:30:57 vice0 kernel: CT: 3025: started > > > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
