On Fri, Oct 07, 2016 at 08:38:03PM +0100, 'Viktor Bachraty' via ganeti-devel
> In case of Xen migrations, the most common failure case is when the
> instance fails to freeze so the migration fails with domains running on
> both target and source node. This patchs allows migrate --cleanup to
> recover by running AbortMigrate() in case the instance is running on
> both the source and target node.
I look forward to never having to clean up after this again. However there's
a problem with this patch: the leaky abstraction doesn't deal with differnces
between xm and xl. See below.
> + list of strings, node uuids
> + """
> + self.feedback_fn("* checking where the instance actually runs (if this"
> + " hangs, the hypervisor might be in a bad state)")
Whitespace. s/runs (/runs (/
> + # Verify each result and raise an exception if failed
> + for node_uuid, result in instance_list.items():
> + result.Raise("Can't contact node %s" % node_uuid)
I know this is copy & pasted from the original, but would it be more useful for
this error to report self.cfg.GetNodeName(node_uuid)?
> + # Xen renames the instance during migration, unfortunately we don't have
> + # a nicer way of identifying that it's the same instance. This is an
> + # leaking abstraction.
> + tname = 'migrating-' + name
This is partly true, but unfortunately this leaky abstraction is a little
broken. xm and xl handle this a bit differently:
xm behaviour: (in tools/python/xen/xend/XendCheckpoint.py save() & restore())
source: during: migrating-somedomain
target: during: somedomain
xl behaviour: (in tools/libxl/xl_cmdimpl.c migrate_domain() & migrate_receive())
source: copy: somedomain
target: copy: somedomain--incoming
NB this doesn't affect patch 3/7's 'rename migrating-' instances fix in
FinalizeMigrationSource, because there won't be any, and it'll be a no-op.
However, you should probably update the comment to say that it only applies to
xm, and I think there might be an analogous problem we might want to handle if
xl fails to clean up the --migratedaway instance on the source node.