Arr, this is a bad case :-/

First of all - yes Sven it is a regression. Somewhat relieved by having
a workaround, but not making it not a regression.

First of all I have to beg all your pardon for not seeing this earlier,
due to the bug tasks being closed most of the review passed that we have
on recent and/or dormant bugs filter them out. And now sadly it feels
"too late" to make it much better.

The underlying problem in this case is that the resolution seems to make it 
only worse.
Let me explain that...

If we'd take away the new features in the package itself, then anything
started with 6.0.0-0ubuntu8.5 and later would conflict with the new
update. And that means we are comparing two sets of guests - those
started in the first 11 months vs those started  (or restarted) in the
last 14 months. Even purely on numbers that quite likely means the
latter group would win.

Furthermore - unless I'm mistaken - all future versions Groovy, Hirsute,
Impish, Jammy, ... match upstream which has the features added on those
types. Due to that reverting this we'd also make Focals types
incompatible with future releases and inhibit migration and similar to
those.

In the meantime (between now and the introduction of this change to the
type) there were also two qemu security updates [1][2] which usually
imply to restart or migrate the guests. So quite likely the majority of
those left started before 03/2021 have been re-cycled since then.

I do not think that versioned types would help here (but haven't spent
days to experiment with it), at least not for those guests already
running from the past - and those are the only ones left affected.

So what else than shrugging and feeling bad could we do ...?
I think we could as lessons learned harden the regression tests a bit better.
We could:
1. ensure that no cpu-model is lost only new, but no lost entries in $ virsh 
cpu-models $(uname -m)
2. ensure that we only got adds, but no removals in /usr/share/libvirt/cpu_map/

Those will be helpful, but were not the problem here (we only got Rome
added, no change to EPYC named types), the problem is that there are now
"known features" which before was just noise in the cpuid data. Now that
it is detected properly

3. (Src) check if tests/cputestdata or tests/domaincapsdata got extended
for existing types. Each of those will be another candidate for a
regression of this type and at least should be detected and decided
consciously (if we want/need it) or if we reject the request that caused
it.

I've added that to the list of known "should be tested better on
regression checks" so that it can be implemented to catch those cases
earlier next time.

[1]: https://launchpad.net/ubuntu/+source/qemu/1:4.2-3ubuntu6.17
[2]: https://launchpad.net/ubuntu/+source/qemu/1:4.2-3ubuntu6.21
Ref: SD-670

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1887490

Title:
  [FFe/SRU] Add/Backport EPYC-v3 and EPYC-Rome CPU model

Status in libvirt package in Ubuntu:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Released
Status in libvirt source package in Focal:
  Fix Released
Status in linux source package in Focal:
  Fix Released
Status in qemu source package in Focal:
  Fix Released

Bug description:
  [Impact]

   * CPU definitions are added to libvirt as these CPUs are known
     and added to qemu for execution.
     And due to that over time some are considered missing in
     former releases.

   * To really benefit from the new features of these chips
     they have to be known, therefore new type additions done by
     upstream should be backported if they generally apply and do
     not depend on SRU-critical changes.

   * This backports three upstream fixes that just add definitions
     (no control flow changes)

  [Test Case]

   * Check if it has an EPYC-Rome entry in
     /usr/share/libvirt/cpu_map/index.xml and the file included
     there exists.

   * Define a guest like:
     <cpu mode='custom' match='exact' check='partial'>
       <model fallback='forbid'>EPYC-Rome</model>
     </cpu>
     You can only "really" start this on a system with the
     matching HW. But even on others it will change from:
       error: internal error: Unknown CPU model EPYC-Rome
     to being unable to start for some features missing.

   * libvirt probes a system if a named cpu can be used, after the
     fix this should include EPYC-Rome
     $ virsh domcapabilities | grep EPYC
        <model usable='no'>EPYC-IBPB</model>
        <model usable='no'>EPYC</model>

  [Regression Potential]

   * Usually these type additions are safe unless they add control flow
     changes (e.g. to handle yet unknown types of registers or such) but
     that isn't the case here.
     A regression if any is to be expected on systems that are close to the
     newly added type(s). Those will after the update be detected as such
     if e.g. host-model is used. If then running on a mixed cluster of
     updated/non-updated systems migrations will only work if the target is
     updated as well.

  [Other Info]

   * This is the first build since glibc 2.32 arrived in groovy, hence we
     need to be careful of the fix done for bug 1892826.
     It has to be checked if the linking is fine after the rebuild.
     The workload still works in groovy despite 2.32 being present (I'd ahve 
     expected it doesn't), so we will keep the revert as-is for now.
     To be sure that adds two tests that shall be done:
     - check the linking to point to libtirpc instead of glibc
       $ eu-readelf -a /usr/lib/libvirt/libvirt_lxc | grep xdr_uint64 | grep 
GLOBAL
       Was pointing to glibc, does it still and if so does it work (see 
       below)?
     - run the autopkgtest cases as the LXC tests would trigger an issue if
       there is one

  ----

  ## Qemu SRU ##

  [Impact]

   * CPU definitions are added to qemu as these CPUs are known.
     And due to that over time are missing in former releases.

   * To really benefit from the new features of these chips
     they have to be known, therefore new type additions done by
     upstream should be backported if they generally apply and do
     not depend on SRU-critical changes.

   * This backports two upstream fixes that just add definitions (no
     control flow changes)

  [Test Case]

   * Probe qemu for the known CPU types (works on all HW)
     $ qemu-system-x86_64 -cpu ? | grep EPYC
     Focal without fix:
     x86 EPYC                  (alias configured by machine type)
     x86 EPYC-IBPB             (alias of EPYC-v2)
     x86 EPYC-v1               AMD EPYC Processor
     x86 EPYC-v2               AMD EPYC Processor (with IBPB)
     Focal with fix also adds:
     x86 EPYC-Rome             (alias configured by machine type)
     x86 EPYC-Rome-v1          AMD EPYC-Rome Processor
     x86 EPYC-v3               AMD EPYC Processor

   * Given such HW is available start a KVM guest using those new types
     Since we don't have libvirt support (yet) do so directly in qemu
     commandline like (bootloader is enough)
     $ qemu-system-x86_64 -cpu EPYC-Rome -machine pc-q35-focal,accel=kvm 
-nographic
     $ qemu-system-x86_64 -cpu EPYC-v3 -machine pc-q35-focal,accel=kvm 
-nographic

  [Regression Potential]

   * This adds new CPU types to the list of known CPUs defining their name
     and features. Generally the changes are contained to those new types
     and only active when selected - and usually only selectable on such new
     machines. Therefore not a lot should change for other users.
     One thing thou, if a user selected an unversioned type (which in this
     case only can be "EPYC") by default it will pick the latest subversion
     that applies. In this case the behavior will change and pick EPYC-v3
     after the fix. But this is the whole purpose of versioned (stay as is)
     and unversioned (move with updates) CPU types - so that should be ok.
     The EPYC-Rome type didn't exist in Focal before, so it can't "change"
     for users.

  [Other Info]

   * Depends on the new kernel 5.4.0-49 or later (Currently in
     focal-proposed)

  ---

  Qemu in focal has already support for most (except amd-stibp) flags of
  this model.

  Please backport the following patches:

  https://github.com/qemu/qemu/commit/a16e8dbc043720abcb37fc7dca313e720b4e0f0c

  https://github.com/qemu/qemu/commit/143c30d4d346831a09e59e9af45afdca0331e819

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1887490/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to