On Wed, 26 Aug 2020 11:55:49 +0200 Joerg Jaspert <jo...@debian.org> wrote:

> using Ganeti 2.16 and qemu 1:5.0-14~bpo10+1 I tried setting 
> migration_caps for the cluster. But no matter which value i use it 
> breaks migration.
> 
> Migration then "goes"
>  - Setup disks and prepare target node
>  - starting memore transfer
>  - "Migration failed, aborting"
>  - Closing disks
> 
> That is entirely independent on which value I put into the caps.
> 
> Unsetting migration_caps and retrying the migration - it fails again.

I have tested Ganeti-3.0 from master on Buster with Qemu-5.0 from
backports. The HV-code should be the same for Ganeti-2.16.

First migration works on freshly started instances (empty
migration_caps). I've also tested the following migration capabilities
with success (2 times of migrations): auto-converge, zero-blocks,
xbzrle. The only known one that is broken is postcopy-ram[1].

So I assume Joerg is/was using postcopy-ram? If an instance was
previously migrated with postcopy-ram, the current qemu-process
"remembers" this "setting". If migration_caps is unset (empty), running
instances must be unset as well. On the instance's node run:

echo "migrate_set_capability postcopy-ram off" | socat STDIO 
UNIX-CONNECT:/var/run/ganeti/kvm-hypervisor/ctrl/XXXXX.monitor

> Looking more in detail it appears that its setting the caps on the 
> source side. But possibly forgets to set them on the target side so qemu 
> hates doing migration?!

That is true for postcopy-ram. It worked before Qemu-2.11, but is now
broken (should also be broken with qemu-3.1/default buster version). 

> And when it breaks, it forgets to unset the capabilities, so next 
> migrations break too. One has to manually connect to the monitor and 
> unset them, before migration works again.

Sounds exactly what I described.

[1] https://github.com/ganeti/ganeti/issues/950#issuecomment-506266808

Attachment: pgptEUu87UQFb.pgp
Description: PGP signature

Reply via email to