Hi,

I encountered a problem when performing a live migration of some OpenVZ
CTs. Altough the migration didn't trigger any messages in 'crm_mon' and
was initially performed without any troubles, the resource was restarted
on the target node 'unnecessarily'. From the logs it looks as if after
the actual migration pacemaker detected the resource to be running on
both nodes. Why did it detect that? Could it be that it checked too
early on the source node? Might that be a problem with the RA ManageVE
returing too early?

(For details see below)

Roman




The setup:
* Nodes are running Debian Squeeze with the current pve kernel
* Our CTs are running on an NFS share mounted on both nodes
* pacemaker 1.1.7

Action:
Migration of resource 'netpd' from vice1 to vice0

Log of the source node (vice1)
------------------------------
Mar 20 16:30:57 vice1 ManageVE[107511]: INFO: Setting up checkpoint... 
suspend... dump... kill... Container is unmounted Checkpointing completed 
succesfully
Mar 20 16:30:57 vice1 lrmd: [1523]: info: operation migrate_to[66] on netpd for 
client 1526: pid 107511 exited with return code 0
Mar 20 16:30:57 vice1 crmd: [1526]: info: process_lrm_event: LRM operation 
netpd_migrate_to_0 (call=66, rc=0, cib-update=172, confirmed=true) ok
[...]
Mar 20 16:30:57 vice1 pengine: [1525]: ERROR: native_create_actions: Resource 
netpd (ocf::ManageVE) is active on 2 nodes attempting recovery
Mar 20 16:30:57 vice1 pengine: [1525]: WARN: native_create_actions: See 
http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information.
[...]
Mar 20 16:30:57 vice1 pengine: [1525]: notice: LogActions: Restart 
netpd#011(Started vice0)


Log of the target node (vice0)
-------------------------------
Mar 20 16:30:57 vice0 lrmd: [1543]: info: rsc:netpd stop[23] (pid 2191)
[...]
Mar 20 16:30:57 vice0 ManageVE[2191]: INFO: VE 3025 already stopped.
[...]
Mar 20 16:30:57 vice0 lrmd: [1543]: info: operation stop[23] on netpd for 
client 1546: pid 2191 exited with return code 0
[...]
Mar 20 16:30:57 vice0 crmd: [1546]: info: process_lrm_event: LRM operation 
netpd_stop_0 (call=23, rc=0, cib-update=28, confirmed=true) ok
[...]
Mar 20 16:30:57 vice0 lrmd: [1543]: info: rsc:netpd start[27] (pid 2275)
[...]
Mar 20 16:30:57 vice0 kernel: CT: 3025: started




_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to