(sorry for later answer - I was on vacation last week).

On 6/1/26 15:44, Jiří Denemark wrote:
> On Tue, May 26, 2026 at 17:17:10 +0200, Denis V. Lunev wrote:
>> qemuDomainMakeCPUMigratable() strips features marked added='yes' (in
>> src/cpu_map/x86_*.xml) from the migration cookie when the source CPU
>> was specified as host-model. The intent was libvirt-protocol compat
>> with older destinations; the cost is guest CPU compat, paid silently
>> on every migration.
> Sigh, yeah there's still (at least) one thing missing around vmx
> migration...
>
>> Every Intel x86 CPU model from Westmere through Sapphire Rapids
>> carries 60+ added='yes' features, including
>> vmx-exit-load-perf-global-ctrl and vmx-entry-load-perf-global-ctrl
>> that control the LOAD_IA32_PERF_GLOBAL_CTRL allowed-1 bits of
>> MSR_IA32_VMX_{EXIT,ENTRY}_CTLS. A host-model live migration on any
>> of these models drops those features from the destination's qemu
>> argv.
> Yeah, unless those vmx features are explicitly specified in the domain
> XML, libvirt adds them according to what QEMU enabled based on the -cpu
> command line and then removes them during migration. The assumption is
> that if QEMU automatically added them on the source host, it will also
> add them on the destination host.

I agree with this assumption, but the problem is subtle.
Libvirt specifies these features using command line on source
and misses them on target.

This is the bug and QEMU is innocent here.

It works specifically as wanted - if there are options,
QEMU enables features and without them does not provide
options to the guest.

My statement here is very simple. We are able to catch
user visible problem with this. VMX features are noticeably
inconsistent and lucky enough to get this triggered not
in complex nested behavior but with very simple CPU
hotplug, which has really checked these features.


>> Modern qemu gates the nested VMX capability MSRs on the explicit -cpu
>> list, so the guest's MSR view shifts.
> What exactly "modern qemu" means? Do you have an exact version? Anyway,
> this sounds like a QEMU bug to me. If a combination of -cpu command line
> and a machine type enabled the features on the source, the same
> combination should enable them on the source as well. The machine type
> does not change during migration and treating vmx features shouldn't
> change either.
The problem has been reported for QEMU 10.0 but there is no
difference with upstream. I have not checked specifically,
but I expect no difference. As written above - QEMU is dumb
and should be dumb. CPU options specified via command line
should be same on source and on target.


>> Drop the strip. If a destination libvirt does not know a feature in
>> the cookie, its parser rejects the migration with a precise
>> unknown-feature error: operators can upgrade or narrow the source
>> CPU definition. Either is visible; the status quo is not.
> The problem is old libvirt was not tracking vmx features at all and thus
> any domain started on new libvirt would fail to migrate to an older
> libvirt. If a user explicitly required a vmx feature in domain XML, old
> libvirt would correctly refuse incoming migration because of an unknown
> CPU feature. But it's new libvirt suddenly recognizing features QEMU
> always enabled and adding them to domain XML. We always deal with such
> situation by dropping the automatically added XML elements for migration
> compatibility.
Are you supposing that we should pass all options in
migration cookie? That would be sane. But the problem
if both way problematic. Features should not gone and
should not appear.


>> This effectively reverts 14d3517410 ("qemu: domain: Drop added
>> features from migratable CPU") together with its follow-up
>> aae8a5774b ("qemu: Drop vmx-* from migratable CPU model only when
>> origCPU is set"), and removes the now-unused origCPU plumbing in
>> qemuDomainMakeCPUMigratable() and its callers.
> Unfortunately this is not the correct solution. We don't have a policy
> for backward migration compatibility with old libvirt so we should keep
> the compatibility as long as the domain XML does not contain anything
> unknown to the old libvirt.
>
> That said, we're missing a code that would transfer all the removed vmx
> flags in a migration cookie to make sure new libvirt can see the exact
> CPU definition and thus can explicitly request all the vmx features the
> source QEMU added.
>
> Jirka
>
Let us discuss what can we do. The problem is real and subtle.
This is definitely Libvirt side problem.

Thank you in advance,
    Den

Reply via email to