commit xen for openSUSE:Factory

Source-Sync Tue, 18 Mar 2025 09:47:32 -0700

Script 'mail_helper' called by obssrc
Hello community,

here is the log from the commit of package xen for openSUSE:Factory checked in 
at 2025-03-18 17:37:27
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/xen (Old)
 and      /work/SRC/openSUSE:Factory/.xen.new.19136 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Package is "xen"

Tue Mar 18 17:37:27 2025 rev:362 rq:1253904 version:4.20.0_10

Changes:
--------
--- /work/SRC/openSUSE:Factory/xen/xen.changes  2025-03-06 14:48:04.371789375 
+0100
+++ /work/SRC/openSUSE:Factory/.xen.new.19136/xen.changes       2025-03-18 
17:37:38.832360142 +0100
@@ -1,0 +2,18 @@
+Thu Mar 13 12:50:00 CET 2025 - [email protected]
+
+- bsc#1219354 - xen channels and domU console
+  67c86fc1-xl-fix-channel-configuration-setting.patch
+- bsc#1227301 - Kernel boot crashes on Thinkpad P14s Gen 3 AMD
+  67c818d4-x86-log-unhandled-mem-accesses-for-PVH-dom0.patch
+  67c818d5-x86-fixup-p2m-page-faults-for-PVH-dom0.patch
+  67c818d6-x86-PVH-dom0-correct-iomem_caps-bound.patch
+  67c818d7-x86-IOMMU-account-for-IOMEM-caps-when-populating.patch
+  67c818d8-x86-Dom0-relax-Interrupt-Address-Range.patch
+- bsc#1237692 - When attempting to start guest vm's libxl fills disk with 
errors
+  67d2a3fe-libxl-avoid-infinite-loop-in-libxl__remove_directory.patch
+- Upstream bug fixes (bsc#1027519)
+  67cb03e0-x86-vlapic-ESR-write-handling.patch
+  67d17edd-x86-expose-MSR_FAM10H_MMIO_CONF_BASE-on-AMD.patch
+  67d17ede-VT-x-PI-usage-of-msi_desc-msg-field.patch
+
+-------------------------------------------------------------------

New:
----
  67c818d4-x86-log-unhandled-mem-accesses-for-PVH-dom0.patch
  67c818d5-x86-fixup-p2m-page-faults-for-PVH-dom0.patch
  67c818d6-x86-PVH-dom0-correct-iomem_caps-bound.patch
  67c818d7-x86-IOMMU-account-for-IOMEM-caps-when-populating.patch
  67c818d8-x86-Dom0-relax-Interrupt-Address-Range.patch
  67c86fc1-xl-fix-channel-configuration-setting.patch
  67cb03e0-x86-vlapic-ESR-write-handling.patch
  67d17edd-x86-expose-MSR_FAM10H_MMIO_CONF_BASE-on-AMD.patch
  67d17ede-VT-x-PI-usage-of-msi_desc-msg-field.patch
  67d2a3fe-libxl-avoid-infinite-loop-in-libxl__remove_directory.patch

BETA DEBUG BEGIN:
  New:- bsc#1227301 - Kernel boot crashes on Thinkpad P14s Gen 3 AMD
  67c818d4-x86-log-unhandled-mem-accesses-for-PVH-dom0.patch
  67c818d5-x86-fixup-p2m-page-faults-for-PVH-dom0.patch
  New:  67c818d4-x86-log-unhandled-mem-accesses-for-PVH-dom0.patch
  67c818d5-x86-fixup-p2m-page-faults-for-PVH-dom0.patch
  67c818d6-x86-PVH-dom0-correct-iomem_caps-bound.patch
  New:  67c818d5-x86-fixup-p2m-page-faults-for-PVH-dom0.patch
  67c818d6-x86-PVH-dom0-correct-iomem_caps-bound.patch
  67c818d7-x86-IOMMU-account-for-IOMEM-caps-when-populating.patch
  New:  67c818d6-x86-PVH-dom0-correct-iomem_caps-bound.patch
  67c818d7-x86-IOMMU-account-for-IOMEM-caps-when-populating.patch
  67c818d8-x86-Dom0-relax-Interrupt-Address-Range.patch
  New:  67c818d7-x86-IOMMU-account-for-IOMEM-caps-when-populating.patch
  67c818d8-x86-Dom0-relax-Interrupt-Address-Range.patch
- bsc#1237692 - When attempting to start guest vm's libxl fills disk with errors
  New:- bsc#1219354 - xen channels and domU console
  67c86fc1-xl-fix-channel-configuration-setting.patch
- bsc#1227301 - Kernel boot crashes on Thinkpad P14s Gen 3 AMD
  New:- Upstream bug fixes (bsc#1027519)
  67cb03e0-x86-vlapic-ESR-write-handling.patch
  67d17edd-x86-expose-MSR_FAM10H_MMIO_CONF_BASE-on-AMD.patch
  New:  67cb03e0-x86-vlapic-ESR-write-handling.patch
  67d17edd-x86-expose-MSR_FAM10H_MMIO_CONF_BASE-on-AMD.patch
  67d17ede-VT-x-PI-usage-of-msi_desc-msg-field.patch
  New:  67d17edd-x86-expose-MSR_FAM10H_MMIO_CONF_BASE-on-AMD.patch
  67d17ede-VT-x-PI-usage-of-msi_desc-msg-field.patch
  New:- bsc#1237692 - When attempting to start guest vm's libxl fills disk with 
errors
  67d2a3fe-libxl-avoid-infinite-loop-in-libxl__remove_directory.patch
- Upstream bug fixes (bsc#1027519)
BETA DEBUG END:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Other differences:
------------------
++++++ xen.spec ++++++
--- /var/tmp/diff_new_pack.8ekEU9/_old  2025-03-18 17:37:40.772441464 +0100
+++ /var/tmp/diff_new_pack.8ekEU9/_new  2025-03-18 17:37:40.776441632 +0100
@@ -125,7 +125,7 @@
 BuildRequires:  python-rpm-macros
 Provides:       installhint(reboot-needed)
 
-Version:        4.20.0_08
+Version:        4.20.0_10
 Release:        0
 Summary:        Xen Virtualization: Hypervisor (aka VMM aka Microkernel)
 License:        GPL-2.0-only
@@ -160,6 +160,16 @@
 # For xen-libs
 Source99:       baselibs.conf
 # Upstream patches
+Patch1:         67c818d4-x86-log-unhandled-mem-accesses-for-PVH-dom0.patch
+Patch2:         67c818d5-x86-fixup-p2m-page-faults-for-PVH-dom0.patch
+Patch3:         67c818d6-x86-PVH-dom0-correct-iomem_caps-bound.patch
+Patch4:         67c818d7-x86-IOMMU-account-for-IOMEM-caps-when-populating.patch
+Patch5:         67c818d8-x86-Dom0-relax-Interrupt-Address-Range.patch
+Patch6:         67c86fc1-xl-fix-channel-configuration-setting.patch
+Patch7:         67cb03e0-x86-vlapic-ESR-write-handling.patch
+Patch8:         67d17edd-x86-expose-MSR_FAM10H_MMIO_CONF_BASE-on-AMD.patch
+Patch9:         67d17ede-VT-x-PI-usage-of-msi_desc-msg-field.patch
+Patch10:        
67d2a3fe-libxl-avoid-infinite-loop-in-libxl__remove_directory.patch
 # EMBARGOED security fixes
 # libxc
 Patch301:       libxc-bitmap-long.patch

++++++ 67c818d4-x86-log-unhandled-mem-accesses-for-PVH-dom0.patch ++++++
# Commit 43d8a80a0cccfe3715bb3178b5c15fb983979651
# Date 2025-03-05 10:26:46 +0100
# Author Roger Pau Monne <[email protected]>
# Committer Roger Pau Monne <[email protected]>
x86/emul: dump unhandled memory accesses for PVH dom0

A PV dom0 can map any host memory as long as it's allowed by the IO
capability range in d->iomem_caps.  On the other hand, a PVH dom0 has no
way to populate MMIO region onto it's p2m, so it's limited to what Xen
initially populates on the p2m based on the host memory map and the enabled
device BARs.

Introduce a new debug build only printk that reports attempts by dom0 to
access addresses not populated on the p2m, and not handled by any emulator.
This is for information purposes only, but might allow getting an idea of
what MMIO ranges might be missing on the p2m.

Signed-off-by: Roger Pau MonnÃ© <[email protected]>
Acked-by: Jan Beulich <[email protected]>

--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -337,6 +337,9 @@ static int hvmemul_do_io(
         /* If there is no suitable backing DM, just ignore accesses */
         if ( !s )
         {
+            if ( is_mmio && is_hardware_domain(currd) )
+                gdprintk(XENLOG_DEBUG, "unhandled memory %s %#lx size %u\n",
+                         dir ? "read from" : "write to", addr, size);
             rc = hvm_process_io_intercept(&null_handler, &p);
             vio->req.state = STATE_IOREQ_NONE;
         }

++++++ 67c818d5-x86-fixup-p2m-page-faults-for-PVH-dom0.patch ++++++

References: bsc#1227301

# Commit 104591f5dd675d7bfb04885dace0e4e5a097fc1e
# Date 2025-03-05 10:26:46 +0100
# Author Roger Pau Monne <[email protected]>
# Committer Roger Pau Monne <[email protected]>
x86/dom0: attempt to fixup p2m page-faults for PVH dom0

When building a PVH dom0 Xen attempts to map all (relevant) MMIO regions
into the p2m for dom0 access.  However the information Xen has about the
host memory map is limited.  Xen doesn't have access to any resources
described in ACPI dynamic tables, and hence the p2m mappings provided might
not be complete.

PV doesn't suffer from this issue because a PV dom0 is capable of mapping
into it's page-tables any address not explicitly banned in d->iomem_caps.

Introduce a new command line options that allows Xen to attempt to fixup
the p2m page-faults, by creating p2m identity maps in response to p2m
page-faults.

This is aimed as a workaround to small ACPI regions Xen doesn't know about.
Note that missing large MMIO regions mapped in this way will lead to
slowness due to the VM exit processing, plus the mappings will always use
small pages.

The ultimate aim is to attempt to bring better parity with a classic PV
dom0.

Note such fixup rely on the CPU doing the access to the unpopulated
address.  If the access is attempted from a device instead there's no
possible way to fixup, as IOMMU page-fault are asynchronous.

Signed-off-by: Roger Pau MonnÃ© <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Acked-by: Oleksii Kurochko <[email protected]>

--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -4,6 +4,12 @@ Notable changes to Xen will be documente
 
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 
+## 
[4.20.1](https://xenbits.xenproject.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-4.20.1)
+
+### Added
+ - On x86:
+   - Option to attempt to fixup p2m page-faults on PVH dom0.
+
 ## 
[4.20.0](https://xenbits.xenproject.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-4.20.0)
 - 2025-03-05
 
 ### Changed
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -822,7 +822,8 @@ Specify the bit width of the DMA heap.
 
 ### dom0
     = List of [ pv | pvh, shadow=<bool>, verbose=<bool>,
-                cpuid-faulting=<bool>, msr-relaxed=<bool> ] (x86)
+                cpuid-faulting=<bool>, msr-relaxed=<bool>,
+                pf-fixup=<bool> ] (x86)
 
     = List of [ sve=<integer> ] (Arm64)
 
@@ -883,6 +884,19 @@ Controls for how dom0 is constructed on
 
     If using this option is necessary to fix an issue, please report a bug.
 
+*   The `pf-fixup` boolean is only applicable when using a PVH dom0 and
+    defaults to false.
+
+    When running dom0 in PVH mode the dom0 kernel has no way to map MMIO
+    regions into its physical memory map, such mode relies on Xen dom0 builder
+    populating the physical memory map with all MMIO regions that dom0 should
+    access.  However Xen doesn't have a complete picture of the host memory
+    map, due to not being able to process ACPI dynamic tables.
+
+    The `pf-fixup` option allows Xen to attempt to add missing MMIO regions
+    to the dom0 physical memory map in response to page-faults generated by
+    dom0 trying to access unpopulated entries in the memory map.
+
 Enables features on dom0 on Arm systems.
 
 *   The `sve` integer parameter enables Arm SVE usage for Dom0 and sets the
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -16,6 +16,7 @@
 #include <asm/dom0_build.h>
 #include <asm/guest.h>
 #include <asm/hpet.h>
+#include <asm/hvm/emulate.h>
 #include <asm/io-ports.h>
 #include <asm/io_apic.h>
 #include <asm/p2m.h>
@@ -286,6 +287,10 @@ int __init parse_arch_dom0_param(const c
         opt_dom0_cpuid_faulting = val;
     else if ( (val = parse_boolean("msr-relaxed", s, e)) >= 0 )
         opt_dom0_msr_relaxed = val;
+#ifdef CONFIG_HVM
+    else if ( (val = parse_boolean("pf-fixup", s, e)) >= 0 )
+        opt_dom0_pf_fixup = val;
+#endif
     else
         return -EINVAL;
 
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -10,12 +10,15 @@
  */
 
 #include <xen/init.h>
+#include <xen/iocap.h>
 #include <xen/ioreq.h>
 #include <xen/lib.h>
 #include <xen/sched.h>
 #include <xen/paging.h>
 #include <xen/trace.h>
 #include <xen/vm_event.h>
+
+#include <asm/altp2m.h>
 #include <asm/event.h>
 #include <asm/i387.h>
 #include <asm/xstate.h>
@@ -161,6 +164,36 @@ void hvmemul_cancel(struct vcpu *v)
     hvmemul_cache_disable(v);
 }
 
+bool __ro_after_init opt_dom0_pf_fixup;
+static int hwdom_fixup_p2m(paddr_t addr)
+{
+    unsigned long gfn = paddr_to_pfn(addr);
+    struct domain *currd = current->domain;
+    p2m_type_t type;
+    mfn_t mfn;
+    int rc;
+
+    ASSERT(is_hardware_domain(currd));
+    ASSERT(!altp2m_active(currd));
+
+    /*
+     * Fixups are only applied for MMIO holes, and rely on the hardware domain
+     * having identity mappings for non RAM regions (gfn == mfn).
+     */
+    if ( !iomem_access_permitted(currd, gfn, gfn) ||
+         !is_memory_hole(_mfn(gfn), _mfn(gfn)) )
+        return -EPERM;
+
+    mfn = get_gfn(currd, gfn, &type);
+    if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) )
+        rc = mfn_eq(mfn, _mfn(gfn)) ? -EEXIST : -ENOTEMPTY;
+    else
+        rc = set_mmio_p2m_entry(currd, _gfn(gfn), _mfn(gfn), 0);
+    put_gfn(currd, gfn);
+
+    return rc;
+}
+
 static int hvmemul_do_io(
     bool is_mmio, paddr_t addr, unsigned long *reps, unsigned int size,
     uint8_t dir, bool df, bool data_is_addr, uintptr_t data)
@@ -338,8 +371,45 @@ static int hvmemul_do_io(
         if ( !s )
         {
             if ( is_mmio && is_hardware_domain(currd) )
-                gdprintk(XENLOG_DEBUG, "unhandled memory %s %#lx size %u\n",
-                         dir ? "read from" : "write to", addr, size);
+            {
+                /*
+                 * PVH dom0 is likely missing MMIO mappings on the p2m, due to
+                 * the incomplete information Xen has about the memory layout.
+                 *
+                 * Either print a message to note dom0 attempted to access an
+                 * unpopulated GPA, or try to fixup the p2m by creating an
+                 * identity mapping for the faulting GPA.
+                 */
+                if ( opt_dom0_pf_fixup )
+                {
+                    int inner_rc = hwdom_fixup_p2m(addr);
+
+                    if ( !inner_rc || inner_rc == -EEXIST )
+                    {
+                        if ( !inner_rc )
+                            gdprintk(XENLOG_DEBUG,
+                                     "fixup p2m mapping for page %lx added\n",
+                                     paddr_to_pfn(addr));
+                        else
+                            gprintk(XENLOG_INFO,
+                                    "fixup p2m mapping for page %lx already 
present\n",
+                                    paddr_to_pfn(addr));
+
+                        rc = X86EMUL_RETRY;
+                        vio->req.state = STATE_IOREQ_NONE;
+                        break;
+                    }
+
+                    gprintk(XENLOG_WARNING,
+                            "unable to fixup memory %s %#lx size %u: %d\n",
+                            dir ? "read from" : "write to", addr, size,
+                            inner_rc);
+                }
+                else
+                    gdprintk(XENLOG_DEBUG,
+                             "unhandled memory %s %#lx size %u\n",
+                             dir ? "read from" : "write to", addr, size);
+            }
             rc = hvm_process_io_intercept(&null_handler, &p);
             vio->req.state = STATE_IOREQ_NONE;
         }
--- a/xen/arch/x86/include/asm/hvm/emulate.h
+++ b/xen/arch/x86/include/asm/hvm/emulate.h
@@ -148,6 +148,9 @@ static inline void hvmemul_write_cache(c
 void hvm_dump_emulation_state(const char *loglvl, const char *prefix,
                               struct hvm_emulate_ctxt *hvmemul_ctxt, int rc);
 
+/* For PVH dom0: signal whether to attempt fixup of p2m page-faults. */
+extern bool opt_dom0_pf_fixup;
+
 #endif /* __ASM_X86_HVM_EMULATE_H__ */
 
 /*

++++++ 67c818d6-x86-PVH-dom0-correct-iomem_caps-bound.patch ++++++
# Commit a00e08799cc7657d2a1aca158f4ad43d4c9103e7
# Date 2025-03-05 10:26:46 +0100
# Author Roger Pau Monne <[email protected]>
# Committer Roger Pau Monne <[email protected]>
x86/dom0: correctly set the maximum ->iomem_caps bound for PVH

The logic in dom0_setup_permissions() sets the maximum bound in
->iomem_caps unconditionally using paddr_bits, which is not correct for HVM
based domains.  Instead use domain_max_paddr_bits() to get the correct
maximum paddr bits for each possible domain type.

Switch to using PFN_DOWN() instead of PAGE_SHIFT, as that's shorter.

Fixes: 53de839fb409 ('x86: constrain MFN range Dom0 may access')
Signed-off-by: Roger Pau MonnÃ© <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>

--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -481,7 +481,8 @@ int __init dom0_setup_permissions(struct
 
     /* The hardware domain is initially permitted full I/O capabilities. */
     rc = ioports_permit_access(d, 0, 0xFFFF);
-    rc |= iomem_permit_access(d, 0UL, (1UL << (paddr_bits - PAGE_SHIFT)) - 1);
+    rc |= iomem_permit_access(d, 0UL,
+                              PFN_DOWN(1UL << domain_max_paddr_bits(d)) - 1);
     rc |= irqs_permit_access(d, 1, nr_irqs_gsi - 1);
 
     /* Modify I/O port access permissions. */

++++++ 67c818d7-x86-IOMMU-account-for-IOMEM-caps-when-populating.patch ++++++
# Commit 62f3fc5296c452285e81adb50976bde2d68d3181
# Date 2025-03-05 10:26:46 +0100
# Author Roger Pau Monne <[email protected]>
# Committer Roger Pau Monne <[email protected]>
x86/iommu: account for IOMEM caps when populating dom0 IOMMU page-tables

The current code in arch_iommu_hwdom_init() kind of open-codes the same
MMIO permission ranges that are added to the hardware domain ->iomem_caps.
Avoid this duplication and use ->iomem_caps in arch_iommu_hwdom_init() to
filter which memory regions should be added to the dom0 IOMMU page-tables.

Note the IO-APIC and MCFG page(s) must be set as not accessible for a PVH
dom0, otherwise the internal Xen emulation for those ranges won't work.
This requires adjustments in dom0_setup_permissions().

The call to pvh_setup_mmcfg() in dom0_construct_pvh() must now strictly be
done ahead of setting up dom0 permissions, so take the opportunity to also
put it inside the existing is_hardware_domain() region.

Also the special casing of E820_UNUSABLE regions no longer needs to be done
in arch_iommu_hwdom_init(), as those regions are already blocked in
->iomem_caps and thus would be removed from the rangeset as part of
->iomem_caps processing in arch_iommu_hwdom_init().  The E820_UNUSABLE
regions below 1Mb are not removed from ->iomem_caps, that's a slight
difference for the IOMMU created page-tables, but the aim is to allow
access to the same memory either from the CPU or the IOMMU page-tables.

Since ->iomem_caps already takes into account the domain max paddr, there's
no need to remove any regions past the last address addressable by the
domain, as applying ->iomem_caps would have already taken care of that.

Suggested-by: Jan Beulich <[email protected]>
Signed-off-by: Roger Pau MonnÃ© <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>

--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -558,7 +558,9 @@ int __init dom0_setup_permissions(struct
     for ( i = 0; i < nr_ioapics; i++ )
     {
         mfn = paddr_to_pfn(mp_ioapics[i].mpc_apicaddr);
-        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+        /* If emulating IO-APIC(s) make sure the base address is unmapped. */
+        if ( has_vioapic(d) ||
+             !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
             rc |= iomem_deny_access(d, mfn, mfn);
     }
     /* MSI range. */
@@ -599,6 +601,13 @@ int __init dom0_setup_permissions(struct
             rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
     }
 
+    if ( has_vpci(d) )
+        /*
+         * TODO: runtime added MMCFG regions are not checked to make sure they
+         * don't overlap with already mapped regions, thus preventing trapping.
+         */
+        rc |= vpci_mmcfg_deny_access(d);
+
     return rc;
 }
 
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -1324,6 +1324,13 @@ int __init dom0_construct_pvh(struct boo
     if ( is_hardware_domain(d) )
     {
         /*
+         * MMCFG initialization must be performed before setting domain
+         * permissions, as the MCFG areas must not be part of the domain IOMEM
+         * accessible regions.
+         */
+        pvh_setup_mmcfg(d);
+
+        /*
          * Setup permissions early so that calls to add MMIO regions to the
          * p2m as part of vPCI setup don't fail due to permission checks.
          */
@@ -1336,13 +1343,6 @@ int __init dom0_construct_pvh(struct boo
     }
 
     /*
-     * NB: MMCFG initialization needs to be performed before iommu
-     * initialization so the iommu code can fetch the MMCFG regions used by the
-     * domain.
-     */
-    pvh_setup_mmcfg(d);
-
-    /*
      * Craft dom0 physical memory map and set the paging allocation. This must
      * be done before the iommu initializion, since iommu initialization code
      * will likely add mappings required by devices to the p2m (ie: RMRRs).
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -363,14 +363,14 @@ static const struct hvm_mmcfg *vpci_mmcf
     return NULL;
 }
 
-int __hwdom_init vpci_subtract_mmcfg(const struct domain *d, struct rangeset 
*r)
+int __hwdom_init vpci_mmcfg_deny_access(struct domain *d)
 {
     const struct hvm_mmcfg *mmcfg;
 
     list_for_each_entry ( mmcfg, &d->arch.hvm.mmcfg_regions, next )
     {
-        int rc = rangeset_remove_range(r, PFN_DOWN(mmcfg->addr),
-                                       PFN_DOWN(mmcfg->addr + mmcfg->size - 
1));
+        int rc = iomem_deny_access(d, PFN_DOWN(mmcfg->addr),
+                                   PFN_DOWN(mmcfg->addr + mmcfg->size - 1));
 
         if ( rc )
             return rc;
--- a/xen/arch/x86/include/asm/hvm/io.h
+++ b/xen/arch/x86/include/asm/hvm/io.h
@@ -132,8 +132,8 @@ int register_vpci_mmcfg_handler(struct d
 /* Destroy tracked MMCFG areas. */
 void destroy_vpci_mmcfg(struct domain *d);
 
-/* Remove MMCFG regions from a given rangeset. */
-int vpci_subtract_mmcfg(const struct domain *d, struct rangeset *r);
+/* Remove MMCFG regions from a domain ->iomem_caps. */
+int vpci_mmcfg_deny_access(struct domain *d);
 
 #endif /* __ASM_X86_HVM_IO_H__ */
 
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -320,6 +320,26 @@ static int __hwdom_init cf_check map_sub
     return rangeset_remove_range(map, s, e);
 }
 
+struct handle_iomemcap {
+    struct rangeset *r;
+    unsigned long last;
+};
+static int __hwdom_init cf_check map_subtract_iomemcap(unsigned long s,
+                                                       unsigned long e,
+                                                       void *data)
+{
+    struct handle_iomemcap *h = data;
+    int rc = 0;
+
+    if ( h->last != s )
+        rc = rangeset_remove_range(h->r, h->last, s - 1);
+
+    ASSERT(e < ~0UL);
+    h->last = e + 1;
+
+    return rc;
+}
+
 struct map_data {
     struct domain *d;
     unsigned int flush_flags;
@@ -400,6 +420,7 @@ void __hwdom_init arch_iommu_hwdom_init(
     unsigned int i;
     struct rangeset *map;
     struct map_data map_data = { .d = d };
+    struct handle_iomemcap iomem = {};
     int rc;
 
     BUG_ON(!is_hardware_domain(d));
@@ -442,14 +463,6 @@ void __hwdom_init arch_iommu_hwdom_init(
 
         switch ( entry.type )
         {
-        case E820_UNUSABLE:
-            /* Only relevant for inclusive mode, otherwise this is a no-op. */
-            rc = rangeset_remove_range(map, PFN_DOWN(entry.addr),
-                                       PFN_DOWN(entry.addr + entry.size - 1));
-            if ( rc )
-                panic("IOMMU failed to remove unusable memory: %d\n", rc);
-            continue;
-
         case E820_RESERVED:
             if ( !iommu_hwdom_inclusive && !iommu_hwdom_reserved )
                 continue;
@@ -475,22 +488,13 @@ void __hwdom_init arch_iommu_hwdom_init(
     if ( rc )
         panic("IOMMU failed to remove Xen ranges: %d\n", rc);
 
-    /* Remove any overlap with the Interrupt Address Range. */
-    rc = rangeset_remove_range(map, 0xfee00, 0xfeeff);
+    iomem.r = map;
+    rc = rangeset_report_ranges(d->iomem_caps, 0, ~0UL, map_subtract_iomemcap,
+                                &iomem);
+    if ( !rc && iomem.last < ~0UL )
+        rc = rangeset_remove_range(map, iomem.last, ~0UL);
     if ( rc )
-        panic("IOMMU failed to remove Interrupt Address Range: %d\n", rc);
-
-    /* If emulating IO-APIC(s) make sure the base address is unmapped. */
-    if ( has_vioapic(d) )
-    {
-        for ( i = 0; i < d->arch.hvm.nr_vioapics; i++ )
-        {
-            rc = rangeset_remove_singleton(map,
-                PFN_DOWN(domain_vioapic(d, i)->base_address));
-            if ( rc )
-                panic("IOMMU failed to remove IO-APIC: %d\n", rc);
-        }
-    }
+        panic("IOMMU failed to remove forbidden regions: %d\n", rc);
 
     if ( is_pv_domain(d) )
     {
@@ -506,23 +510,6 @@ void __hwdom_init arch_iommu_hwdom_init(
             panic("IOMMU failed to remove read-only regions: %d\n", rc);
     }
 
-    if ( has_vpci(d) )
-    {
-        /*
-         * TODO: runtime added MMCFG regions are not checked to make sure they
-         * don't overlap with already mapped regions, thus preventing trapping.
-         */
-        rc = vpci_subtract_mmcfg(d, map);
-        if ( rc )
-            panic("IOMMU unable to remove MMCFG areas: %d\n", rc);
-    }
-
-    /* Remove any regions past the last address addressable by the domain. */
-    rc = rangeset_remove_range(map, PFN_DOWN(1UL << domain_max_paddr_bits(d)),
-                               ~0UL);
-    if ( rc )
-        panic("IOMMU unable to remove unaddressable ranges: %d\n", rc);
-
     if ( iommu_verbose )
         printk(XENLOG_INFO "%pd: identity mappings for IOMMU:\n", d);
 

++++++ 67c818d8-x86-Dom0-relax-Interrupt-Address-Range.patch ++++++

References: bsc#1227301

# Commit 381caa38850771ae218eb6f6d490dc02e40df964
# Date 2025-03-05 10:26:46 +0100
# Author Roger Pau Monne <[email protected]>
# Committer Roger Pau Monne <[email protected]>
x86/dom0: be less restrictive with the Interrupt Address Range

Xen currently prevents dom0 from creating CPU or IOMMU page-table mappings
into the interrupt address range [0xfee00000, 0xfeefffff].  This range has
two different purposes.  For accesses from the CPU is contains the default
position of local APIC page at 0xfee00000.  For accesses from devices
it's the MSI address range, so the address field in the MSI entries
(usually) point to an address on that range to trigger an interrupt.

There are reports of Lenovo Thinkpad devices placing what seems to be the
UCSI shared mailbox at address 0xfeec2000 in the interrupt address range.
Attempting to use that device with a Linux PV dom0 leads to an error when
Linux kernel maps 0xfeec2000:

RIP: e030:xen_mc_flush+0x1e8/0x2b0
 xen_leave_lazy_mmu+0x15/0x60
 vmap_range_noflush+0x408/0x6f0
 __ioremap_caller+0x20d/0x350
 acpi_os_map_iomem+0x1a3/0x1c0
 acpi_ex_system_memory_space_handler+0x229/0x3f0
 acpi_ev_address_space_dispatch+0x17e/0x4c0
 acpi_ex_access_region+0x28a/0x510
 acpi_ex_field_datum_io+0x95/0x5c0
 acpi_ex_extract_from_field+0x36b/0x4e0
 acpi_ex_read_data_from_field+0xcb/0x430
 acpi_ex_resolve_node_to_value+0x2e0/0x530
 acpi_ex_resolve_to_value+0x1e7/0x550
 acpi_ds_evaluate_name_path+0x107/0x170
 acpi_ds_exec_end_op+0x392/0x860
 acpi_ps_parse_loop+0x268/0xa30
 acpi_ps_parse_aml+0x221/0x5e0
 acpi_ps_execute_method+0x171/0x3e0
 acpi_ns_evaluate+0x174/0x5d0
 acpi_evaluate_object+0x167/0x440
 acpi_evaluate_dsm+0xb6/0x130
 ucsi_acpi_dsm+0x53/0x80
 ucsi_acpi_read+0x2e/0x60
 ucsi_register+0x24/0xa0
 ucsi_acpi_probe+0x162/0x1e3
 platform_probe+0x48/0x90
 really_probe+0xde/0x340
 __driver_probe_device+0x78/0x110
 driver_probe_device+0x1f/0x90
 __driver_attach+0xd2/0x1c0
 bus_for_each_dev+0x77/0xc0
 bus_add_driver+0x112/0x1f0
 driver_register+0x72/0xd0
 do_one_initcall+0x48/0x300
 do_init_module+0x60/0x220
 __do_sys_init_module+0x17f/0x1b0
 do_syscall_64+0x82/0x170

Remove the restrictions to create mappings in the interrupt address range
for dom0.  Note that the restriction to map the local APIC page is enforced
separately, and that continues to be present.  Additionally make sure the
emulated local APIC page is also not mapped, in case dom0 is using it.

Note that even if the interrupt address range entries are populated in the
IOMMU page-tables no device access will reach those pages.  Device accesses
to the Interrupt Address Range will always be converted into Interrupt
Messages and are not subject to DMA remapping.

There's also the following restriction noted in Intel VT-d:

> Software must not program paging-structure entries to remap any address to
> the interrupt address range. Untranslated requests and translation requests
> that result in an address in the interrupt range will be blocked with
> condition code LGN.4 or SGN.8. Translated requests with an address in the
> interrupt address range are treated as Unsupported Request (UR).

Similarly for AMD-Vi:

> Accesses to the interrupt address range (Table 3) are defined to go through
> the interrupt remapping portion of the IOMMU and not through address
> translation processing. Therefore, when a transaction is being processed as
> an interrupt remapping operation, the transaction attribute of
> pretranslated or untranslated is ignored.
>
> Software Note: The IOMMU should
> not be configured such that an address translation results in a special
> address such as the interrupt address range.

However those restrictions don't apply to the identity mappings possibly
created for dom0, since the interrupt address range is never subject to DMA
remapping, and hence there's no output address after translation that
belongs to the interrupt address range.

Reported-by: JÃ¼rgen GroÃ <[email protected]>
Link: 
https://lore.kernel.org/xen-devel/[email protected]/
Signed-off-by: Roger Pau MonnÃ© <[email protected]>
Acked-by: Jan Beulich <[email protected]>

--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -554,6 +554,13 @@ int __init dom0_setup_permissions(struct
         mfn = paddr_to_pfn(mp_lapic_addr);
         rc |= iomem_deny_access(d, mfn, mfn);
     }
+    /* If using an emulated local APIC make sure its MMIO is unpopulated. */
+    if ( has_vlapic(d) )
+    {
+        /* Xen doesn't allow changing the local APIC MMIO window position. */
+        mfn = paddr_to_pfn(APIC_DEFAULT_PHYS_BASE);
+        rc |= iomem_deny_access(d, mfn, mfn);
+    }
     /* I/O APICs. */
     for ( i = 0; i < nr_ioapics; i++ )
     {
@@ -563,10 +570,6 @@ int __init dom0_setup_permissions(struct
              !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
             rc |= iomem_deny_access(d, mfn, mfn);
     }
-    /* MSI range. */
-    rc |= iomem_deny_access(d, paddr_to_pfn(MSI_ADDR_BASE_LO),
-                            paddr_to_pfn(MSI_ADDR_BASE_LO +
-                                         MSI_ADDR_DEST_ID_MASK));
     /* HyperTransport range. */
     if ( boot_cpu_data.x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON) )
     {

++++++ 67c86fc1-xl-fix-channel-configuration-setting.patch ++++++

References: bsc#1219354

# Commit e1ccced4afe465d6541c5825a0f8d1b8f5fa4253
# Date 2025-03-05 16:37:37 +0100
# Author Juergen Gross <[email protected]>
# Committer Jan Beulich <[email protected]>
tools/xl: fix channel configuration setting

Channels work differently than other device types: their devid should
be -1 initially in order to distinguish them from the primary console
which has the devid of 0.

So when parsing the channel configuration, use
ARRAY_EXTEND_INIT_NODEVID() in order to avoid overwriting the devid
set by libxl_device_channel_init().

Fixes: 3a6679634766 ("libxl: set channel devid when not provided by 
application")
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Anthony PERARD <[email protected]>

--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -2423,8 +2423,9 @@ void parse_config_data(const char *confi
             char *path = NULL;
             int len;
 
-            chn = ARRAY_EXTEND_INIT(d_config->channels, d_config->num_channels,
-                                   libxl_device_channel_init);
+            chn = ARRAY_EXTEND_INIT_NODEVID(d_config->channels,
+                                            d_config->num_channels,
+                                            libxl_device_channel_init);
 
             split_string_into_string_list(buf, ",", &pairs);
             len = libxl_string_list_length(&pairs);

++++++ 67cb03e0-x86-vlapic-ESR-write-handling.patch ++++++
# Commit b28b590d4a23894672f1dd7fb98cdf9926ecb282
# Date 2025-03-07 14:34:08 +0000
# Author Andrew Cooper <[email protected]>
# Committer Andrew Cooper <[email protected]>
x86/vlapic: Fix handling of writes to APIC_ESR

Xen currently presents APIC_ESR to guests as a simple read/write register.

This is incorrect.  The SDM states:

  The ESR is a write/read register. Before attempt to read from the ESR,
  software should first write to it. (The value written does not affect the
  values read subsequently; only zero may be written in x2APIC mode.) This
  write clears any previously logged errors and updates the ESR with any
  errors detected since the last write to the ESR.

Introduce a new pending_esr field in hvm_hw_lapic.

Update vlapic_error() to accumulate errors here, and extend vlapic_reg_write()
to discard the written value and transfer pending_esr into APIC_ESR.  Reads
are still as before.

Importantly, this means that guests no longer destroys the ESR value it's
looking for in the LVTERR handler when following the SDM instructions.

Signed-off-by: Andrew Cooper <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>

--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -108,7 +108,7 @@ static void vlapic_error(struct vlapic *
     uint32_t esr;
 
     spin_lock_irqsave(&vlapic->esr_lock, flags);
-    esr = vlapic_get_reg(vlapic, APIC_ESR);
+    esr = vlapic->hw.pending_esr;
     if ( (esr & errmask) != errmask )
     {
         uint32_t lvterr = vlapic_get_reg(vlapic, APIC_LVTERR);
@@ -127,7 +127,7 @@ static void vlapic_error(struct vlapic *
                  errmask |= APIC_ESR_RECVILL;
         }
 
-        vlapic_set_reg(vlapic, APIC_ESR, esr | errmask);
+        vlapic->hw.pending_esr |= errmask;
 
         if ( inj )
             vlapic_set_irq(vlapic, lvterr & APIC_VECTOR_MASK, 0);
@@ -802,6 +802,19 @@ void vlapic_reg_write(struct vcpu *v, un
         vlapic_set_reg(vlapic, APIC_ID, val);
         break;
 
+    case APIC_ESR:
+    {
+        unsigned long flags;
+
+        spin_lock_irqsave(&vlapic->esr_lock, flags);
+        val = vlapic->hw.pending_esr;
+        vlapic->hw.pending_esr = 0;
+        spin_unlock_irqrestore(&vlapic->esr_lock, flags);
+
+        vlapic_set_reg(vlapic, APIC_ESR, val);
+        break;
+    }
+
     case APIC_TASKPRI:
         vlapic_set_reg(vlapic, APIC_TASKPRI, val & 0xff);
         break;
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -394,6 +394,7 @@ struct hvm_hw_lapic {
     uint32_t             disabled; /* VLAPIC_xx_DISABLED */
     uint32_t             timer_divisor;
     uint64_t             tdt_msr;
+    uint32_t             pending_esr;
 };
 
 DECLARE_HVM_SAVE_TYPE(LAPIC, 5, struct hvm_hw_lapic);

++++++ 67d17edd-x86-expose-MSR_FAM10H_MMIO_CONF_BASE-on-AMD.patch ++++++
# Commit b4071d28c5bd9ca4fed76031cbf0e782b74209b9
# Date 2025-03-12 13:32:30 +0100
# Author Roger Pau Monne <[email protected]>
# Committer Roger Pau Monne <[email protected]>
x86/msr: expose MSR_FAM10H_MMIO_CONF_BASE on AMD

The MMIO_CONF_BASE reports the base of the MCFG range on AMD systems.
Linux pre-6.14 is unconditionally attempting to read the MSR without a
safe MSR accessor, and since Xen doesn't allow access to it Linux reports
the following error:

unchecked MSR access error: RDMSR from 0xc0010058 at rIP: 0xffffffff8101d19f 
(xen_do_read_msr+0x7f/0xa0)
Call Trace:
 xen_read_msr+0x1e/0x30
 amd_get_mmconfig_range+0x2b/0x80
 quirk_amd_mmconfig_area+0x28/0x100
 pnp_fixup_device+0x39/0x50
 __pnp_add_device+0xf/0x150
 pnp_add_device+0x3d/0x100
 pnpacpi_add_device_handler+0x1f9/0x280
 acpi_ns_get_device_callback+0x104/0x1c0
 acpi_ns_walk_namespace+0x1d0/0x260
 acpi_get_devices+0x8a/0xb0
 pnpacpi_init+0x50/0x80
 do_one_initcall+0x46/0x2e0
 kernel_init_freeable+0x1da/0x2f0
 kernel_init+0x16/0x1b0
 ret_from_fork+0x30/0x50
 ret_from_fork_asm+0x1b/0x30

Such access is conditional to the presence of a device with PnP ID
"PNP0c01", which triggers the execution of the quirk_amd_mmconfig_area()
function.  Note that prior to commit 3fac3734c43a MSR accesses when running
as a PV guest would always use the safe variant, and thus silently handle
the #GP.

Fix by allowing access to the MSR on AMD systems for the hardware domain.

Write attempts to the MSR will still result in #GP for all domain types.

Signed-off-by: Roger Pau MonnÃ© <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>

--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -245,6 +245,14 @@ int guest_rdmsr(struct vcpu *v, uint32_t
         *val = 0;
         break;
 
+    case MSR_FAM10H_MMIO_CONF_BASE:
+        if ( !is_hardware_domain(d) ||
+             !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ||
+             rdmsr_safe(msr, *val) )
+            goto gp_fault;
+
+        break;
+
     case MSR_VIRT_SPEC_CTRL:
         if ( !cp->extd.virt_ssbd )
             goto gp_fault;

++++++ 67d17ede-VT-x-PI-usage-of-msi_desc-msg-field.patch ++++++
# Commit 30f0e55a79206702b4e82e86dad6b35033157858
# Date 2025-03-12 13:32:30 +0100
# Author Roger Pau Monne <[email protected]>
# Committer Roger Pau Monne <[email protected]>
x86/vmx: fix posted interrupts usage of msi_desc->msg field

The current usage of msi_desc->msg in vmx_pi_update_irte() will make the
field contain a translated MSI message, instead of the expected
untranslated one.  This breaks dump_msi(), that use the data in
msi_desc->msg to print the interrupt details.

Fix this by introducing a dummy local msi_msg, and use it with
iommu_update_ire_from_msi().  vmx_pi_update_irte() relies on the MSI
message not changing, so there's no need to propagate the resulting msi_msg
to the hardware, and the contents can be ignored.

Additionally add a comment to clarify that msi_desc->msg must always
contain the untranslated MSI message.

Fixes: a5e25908d18d ('VT-d: introduce new fields in msi_desc to track binding 
with guest interrupt')
Signed-off-by: Roger Pau MonnÃ© <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>

--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -396,6 +396,7 @@ static int cf_check vmx_pi_update_irte(c
     const struct pi_desc *pi_desc = v ? &v->arch.hvm.vmx.pi_desc : NULL;
     struct irq_desc *desc;
     struct msi_desc *msi_desc;
+    struct msi_msg msg;
     int rc;
 
     desc = pirq_spin_lock_irq_desc(pirq, NULL);
@@ -410,12 +411,13 @@ static int cf_check vmx_pi_update_irte(c
     }
     msi_desc->pi_desc = pi_desc;
     msi_desc->gvec = gvec;
+    msg = msi_desc->msg;
 
     spin_unlock_irq(&desc->lock);
 
     ASSERT_PDEV_LIST_IS_READ_LOCKED(msi_desc->dev->domain);
 
-    return iommu_update_ire_from_msi(msi_desc, &msi_desc->msg);
+    return iommu_update_ire_from_msi(msi_desc, &msg);
 
  unlock_out:
     spin_unlock_irq(&desc->lock);
--- a/xen/arch/x86/include/asm/msi.h
+++ b/xen/arch/x86/include/asm/msi.h
@@ -124,7 +124,7 @@ struct msi_desc {
     int irq;
     int remap_index;         /* index in interrupt remapping table */
 
-    struct msi_msg msg;      /* Last set MSI message */
+    struct msi_msg msg;      /* Last set MSI message (untranslated) */
 };
 
 /*

++++++ 67d2a3fe-libxl-avoid-infinite-loop-in-libxl__remove_directory.patch 
++++++

References: bsc#1237692

# Commit 68baeb5c4852e652b9599e049f40477edac4060e
# Date 2025-03-13 10:23:10 +0100
# Author Jan Beulich <[email protected]>
# Committer Jan Beulich <[email protected]>
libxl: avoid infinite loop in libxl__remove_directory()

Infinitely retrying the rmdir() invocation makes little sense. While the
original observation was the log filling the disk (due to repeated
"Directory not empty" errors, in turn occurring for unclear reasons),
the loop wants breaking even if there was no error message being logged
(much like is done in the similar loops in libxl__remove_file() and
libxl__remove_file_or_directory()).

Fixes: c4dcbee67e6d ("libxl: provide libxl__remove_file et al")
Signed-off-by: Jan Beulich <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Acked-by: Anthony PERARD <[email protected]>

--- a/tools/libs/light/libxl_utils.c
+++ b/tools/libs/light/libxl_utils.c
@@ -577,6 +577,7 @@ int libxl__remove_directory(libxl__gc *g
         if (errno == EINTR) continue;
         LOGE(ERROR, "failed to remove emptied directory %s", dirpath);
         rc = ERROR_FAIL;
+        break;
     }
 
  out:

commit xen for openSUSE:Factory

Reply via email to