Script 'mail_helper' called by obssrc Hello community, here is the log from the commit of package xen for openSUSE:Factory checked in at 2025-03-18 17:37:27 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/xen (Old) and /work/SRC/openSUSE:Factory/.xen.new.19136 (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "xen" Tue Mar 18 17:37:27 2025 rev:362 rq:1253904 version:4.20.0_10 Changes: -------- --- /work/SRC/openSUSE:Factory/xen/xen.changes 2025-03-06 14:48:04.371789375 +0100 +++ /work/SRC/openSUSE:Factory/.xen.new.19136/xen.changes 2025-03-18 17:37:38.832360142 +0100 @@ -1,0 +2,18 @@ +Thu Mar 13 12:50:00 CET 2025 - jbeul...@suse.com + +- bsc#1219354 - xen channels and domU console + 67c86fc1-xl-fix-channel-configuration-setting.patch +- bsc#1227301 - Kernel boot crashes on Thinkpad P14s Gen 3 AMD + 67c818d4-x86-log-unhandled-mem-accesses-for-PVH-dom0.patch + 67c818d5-x86-fixup-p2m-page-faults-for-PVH-dom0.patch + 67c818d6-x86-PVH-dom0-correct-iomem_caps-bound.patch + 67c818d7-x86-IOMMU-account-for-IOMEM-caps-when-populating.patch + 67c818d8-x86-Dom0-relax-Interrupt-Address-Range.patch +- bsc#1237692 - When attempting to start guest vm's libxl fills disk with errors + 67d2a3fe-libxl-avoid-infinite-loop-in-libxl__remove_directory.patch +- Upstream bug fixes (bsc#1027519) + 67cb03e0-x86-vlapic-ESR-write-handling.patch + 67d17edd-x86-expose-MSR_FAM10H_MMIO_CONF_BASE-on-AMD.patch + 67d17ede-VT-x-PI-usage-of-msi_desc-msg-field.patch + +------------------------------------------------------------------- New: ---- 67c818d4-x86-log-unhandled-mem-accesses-for-PVH-dom0.patch 67c818d5-x86-fixup-p2m-page-faults-for-PVH-dom0.patch 67c818d6-x86-PVH-dom0-correct-iomem_caps-bound.patch 67c818d7-x86-IOMMU-account-for-IOMEM-caps-when-populating.patch 67c818d8-x86-Dom0-relax-Interrupt-Address-Range.patch 67c86fc1-xl-fix-channel-configuration-setting.patch 67cb03e0-x86-vlapic-ESR-write-handling.patch 67d17edd-x86-expose-MSR_FAM10H_MMIO_CONF_BASE-on-AMD.patch 67d17ede-VT-x-PI-usage-of-msi_desc-msg-field.patch 67d2a3fe-libxl-avoid-infinite-loop-in-libxl__remove_directory.patch BETA DEBUG BEGIN: New:- bsc#1227301 - Kernel boot crashes on Thinkpad P14s Gen 3 AMD 67c818d4-x86-log-unhandled-mem-accesses-for-PVH-dom0.patch 67c818d5-x86-fixup-p2m-page-faults-for-PVH-dom0.patch New: 67c818d4-x86-log-unhandled-mem-accesses-for-PVH-dom0.patch 67c818d5-x86-fixup-p2m-page-faults-for-PVH-dom0.patch 67c818d6-x86-PVH-dom0-correct-iomem_caps-bound.patch New: 67c818d5-x86-fixup-p2m-page-faults-for-PVH-dom0.patch 67c818d6-x86-PVH-dom0-correct-iomem_caps-bound.patch 67c818d7-x86-IOMMU-account-for-IOMEM-caps-when-populating.patch New: 67c818d6-x86-PVH-dom0-correct-iomem_caps-bound.patch 67c818d7-x86-IOMMU-account-for-IOMEM-caps-when-populating.patch 67c818d8-x86-Dom0-relax-Interrupt-Address-Range.patch New: 67c818d7-x86-IOMMU-account-for-IOMEM-caps-when-populating.patch 67c818d8-x86-Dom0-relax-Interrupt-Address-Range.patch - bsc#1237692 - When attempting to start guest vm's libxl fills disk with errors New:- bsc#1219354 - xen channels and domU console 67c86fc1-xl-fix-channel-configuration-setting.patch - bsc#1227301 - Kernel boot crashes on Thinkpad P14s Gen 3 AMD New:- Upstream bug fixes (bsc#1027519) 67cb03e0-x86-vlapic-ESR-write-handling.patch 67d17edd-x86-expose-MSR_FAM10H_MMIO_CONF_BASE-on-AMD.patch New: 67cb03e0-x86-vlapic-ESR-write-handling.patch 67d17edd-x86-expose-MSR_FAM10H_MMIO_CONF_BASE-on-AMD.patch 67d17ede-VT-x-PI-usage-of-msi_desc-msg-field.patch New: 67d17edd-x86-expose-MSR_FAM10H_MMIO_CONF_BASE-on-AMD.patch 67d17ede-VT-x-PI-usage-of-msi_desc-msg-field.patch New:- bsc#1237692 - When attempting to start guest vm's libxl fills disk with errors 67d2a3fe-libxl-avoid-infinite-loop-in-libxl__remove_directory.patch - Upstream bug fixes (bsc#1027519) BETA DEBUG END: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ xen.spec ++++++ --- /var/tmp/diff_new_pack.8ekEU9/_old 2025-03-18 17:37:40.772441464 +0100 +++ /var/tmp/diff_new_pack.8ekEU9/_new 2025-03-18 17:37:40.776441632 +0100 @@ -125,7 +125,7 @@ BuildRequires: python-rpm-macros Provides: installhint(reboot-needed) -Version: 4.20.0_08 +Version: 4.20.0_10 Release: 0 Summary: Xen Virtualization: Hypervisor (aka VMM aka Microkernel) License: GPL-2.0-only @@ -160,6 +160,16 @@ # For xen-libs Source99: baselibs.conf # Upstream patches +Patch1: 67c818d4-x86-log-unhandled-mem-accesses-for-PVH-dom0.patch +Patch2: 67c818d5-x86-fixup-p2m-page-faults-for-PVH-dom0.patch +Patch3: 67c818d6-x86-PVH-dom0-correct-iomem_caps-bound.patch +Patch4: 67c818d7-x86-IOMMU-account-for-IOMEM-caps-when-populating.patch +Patch5: 67c818d8-x86-Dom0-relax-Interrupt-Address-Range.patch +Patch6: 67c86fc1-xl-fix-channel-configuration-setting.patch +Patch7: 67cb03e0-x86-vlapic-ESR-write-handling.patch +Patch8: 67d17edd-x86-expose-MSR_FAM10H_MMIO_CONF_BASE-on-AMD.patch +Patch9: 67d17ede-VT-x-PI-usage-of-msi_desc-msg-field.patch +Patch10: 67d2a3fe-libxl-avoid-infinite-loop-in-libxl__remove_directory.patch # EMBARGOED security fixes # libxc Patch301: libxc-bitmap-long.patch ++++++ 67c818d4-x86-log-unhandled-mem-accesses-for-PVH-dom0.patch ++++++ # Commit 43d8a80a0cccfe3715bb3178b5c15fb983979651 # Date 2025-03-05 10:26:46 +0100 # Author Roger Pau Monne <roger....@citrix.com> # Committer Roger Pau Monne <roger....@citrix.com> x86/emul: dump unhandled memory accesses for PVH dom0 A PV dom0 can map any host memory as long as it's allowed by the IO capability range in d->iomem_caps. On the other hand, a PVH dom0 has no way to populate MMIO region onto it's p2m, so it's limited to what Xen initially populates on the p2m based on the host memory map and the enabled device BARs. Introduce a new debug build only printk that reports attempts by dom0 to access addresses not populated on the p2m, and not handled by any emulator. This is for information purposes only, but might allow getting an idea of what MMIO ranges might be missing on the p2m. Signed-off-by: Roger Pau Monné <roger....@citrix.com> Acked-by: Jan Beulich <jbeul...@suse.com> --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -337,6 +337,9 @@ static int hvmemul_do_io( /* If there is no suitable backing DM, just ignore accesses */ if ( !s ) { + if ( is_mmio && is_hardware_domain(currd) ) + gdprintk(XENLOG_DEBUG, "unhandled memory %s %#lx size %u\n", + dir ? "read from" : "write to", addr, size); rc = hvm_process_io_intercept(&null_handler, &p); vio->req.state = STATE_IOREQ_NONE; } ++++++ 67c818d5-x86-fixup-p2m-page-faults-for-PVH-dom0.patch ++++++ References: bsc#1227301 # Commit 104591f5dd675d7bfb04885dace0e4e5a097fc1e # Date 2025-03-05 10:26:46 +0100 # Author Roger Pau Monne <roger....@citrix.com> # Committer Roger Pau Monne <roger....@citrix.com> x86/dom0: attempt to fixup p2m page-faults for PVH dom0 When building a PVH dom0 Xen attempts to map all (relevant) MMIO regions into the p2m for dom0 access. However the information Xen has about the host memory map is limited. Xen doesn't have access to any resources described in ACPI dynamic tables, and hence the p2m mappings provided might not be complete. PV doesn't suffer from this issue because a PV dom0 is capable of mapping into it's page-tables any address not explicitly banned in d->iomem_caps. Introduce a new command line options that allows Xen to attempt to fixup the p2m page-faults, by creating p2m identity maps in response to p2m page-faults. This is aimed as a workaround to small ACPI regions Xen doesn't know about. Note that missing large MMIO regions mapped in this way will lead to slowness due to the VM exit processing, plus the mappings will always use small pages. The ultimate aim is to attempt to bring better parity with a classic PV dom0. Note such fixup rely on the CPU doing the access to the unpopulated address. If the access is attempted from a device instead there's no possible way to fixup, as IOMMU page-fault are asynchronous. Signed-off-by: Roger Pau Monné <roger....@citrix.com> Reviewed-by: Jan Beulich <jbeul...@suse.com> Acked-by: Oleksii Kurochko <oleksii.kuroc...@gmail.com> --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,12 @@ Notable changes to Xen will be documente The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) +## [4.20.1](https://xenbits.xenproject.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-4.20.1) + +### Added + - On x86: + - Option to attempt to fixup p2m page-faults on PVH dom0. + ## [4.20.0](https://xenbits.xenproject.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-4.20.0) - 2025-03-05 ### Changed --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -822,7 +822,8 @@ Specify the bit width of the DMA heap. ### dom0 = List of [ pv | pvh, shadow=<bool>, verbose=<bool>, - cpuid-faulting=<bool>, msr-relaxed=<bool> ] (x86) + cpuid-faulting=<bool>, msr-relaxed=<bool>, + pf-fixup=<bool> ] (x86) = List of [ sve=<integer> ] (Arm64) @@ -883,6 +884,19 @@ Controls for how dom0 is constructed on If using this option is necessary to fix an issue, please report a bug. +* The `pf-fixup` boolean is only applicable when using a PVH dom0 and + defaults to false. + + When running dom0 in PVH mode the dom0 kernel has no way to map MMIO + regions into its physical memory map, such mode relies on Xen dom0 builder + populating the physical memory map with all MMIO regions that dom0 should + access. However Xen doesn't have a complete picture of the host memory + map, due to not being able to process ACPI dynamic tables. + + The `pf-fixup` option allows Xen to attempt to add missing MMIO regions + to the dom0 physical memory map in response to page-faults generated by + dom0 trying to access unpopulated entries in the memory map. + Enables features on dom0 on Arm systems. * The `sve` integer parameter enables Arm SVE usage for Dom0 and sets the --- a/xen/arch/x86/dom0_build.c +++ b/xen/arch/x86/dom0_build.c @@ -16,6 +16,7 @@ #include <asm/dom0_build.h> #include <asm/guest.h> #include <asm/hpet.h> +#include <asm/hvm/emulate.h> #include <asm/io-ports.h> #include <asm/io_apic.h> #include <asm/p2m.h> @@ -286,6 +287,10 @@ int __init parse_arch_dom0_param(const c opt_dom0_cpuid_faulting = val; else if ( (val = parse_boolean("msr-relaxed", s, e)) >= 0 ) opt_dom0_msr_relaxed = val; +#ifdef CONFIG_HVM + else if ( (val = parse_boolean("pf-fixup", s, e)) >= 0 ) + opt_dom0_pf_fixup = val; +#endif else return -EINVAL; --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -10,12 +10,15 @@ */ #include <xen/init.h> +#include <xen/iocap.h> #include <xen/ioreq.h> #include <xen/lib.h> #include <xen/sched.h> #include <xen/paging.h> #include <xen/trace.h> #include <xen/vm_event.h> + +#include <asm/altp2m.h> #include <asm/event.h> #include <asm/i387.h> #include <asm/xstate.h> @@ -161,6 +164,36 @@ void hvmemul_cancel(struct vcpu *v) hvmemul_cache_disable(v); } +bool __ro_after_init opt_dom0_pf_fixup; +static int hwdom_fixup_p2m(paddr_t addr) +{ + unsigned long gfn = paddr_to_pfn(addr); + struct domain *currd = current->domain; + p2m_type_t type; + mfn_t mfn; + int rc; + + ASSERT(is_hardware_domain(currd)); + ASSERT(!altp2m_active(currd)); + + /* + * Fixups are only applied for MMIO holes, and rely on the hardware domain + * having identity mappings for non RAM regions (gfn == mfn). + */ + if ( !iomem_access_permitted(currd, gfn, gfn) || + !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) + return -EPERM; + + mfn = get_gfn(currd, gfn, &type); + if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) + rc = mfn_eq(mfn, _mfn(gfn)) ? -EEXIST : -ENOTEMPTY; + else + rc = set_mmio_p2m_entry(currd, _gfn(gfn), _mfn(gfn), 0); + put_gfn(currd, gfn); + + return rc; +} + static int hvmemul_do_io( bool is_mmio, paddr_t addr, unsigned long *reps, unsigned int size, uint8_t dir, bool df, bool data_is_addr, uintptr_t data) @@ -338,8 +371,45 @@ static int hvmemul_do_io( if ( !s ) { if ( is_mmio && is_hardware_domain(currd) ) - gdprintk(XENLOG_DEBUG, "unhandled memory %s %#lx size %u\n", - dir ? "read from" : "write to", addr, size); + { + /* + * PVH dom0 is likely missing MMIO mappings on the p2m, due to + * the incomplete information Xen has about the memory layout. + * + * Either print a message to note dom0 attempted to access an + * unpopulated GPA, or try to fixup the p2m by creating an + * identity mapping for the faulting GPA. + */ + if ( opt_dom0_pf_fixup ) + { + int inner_rc = hwdom_fixup_p2m(addr); + + if ( !inner_rc || inner_rc == -EEXIST ) + { + if ( !inner_rc ) + gdprintk(XENLOG_DEBUG, + "fixup p2m mapping for page %lx added\n", + paddr_to_pfn(addr)); + else + gprintk(XENLOG_INFO, + "fixup p2m mapping for page %lx already present\n", + paddr_to_pfn(addr)); + + rc = X86EMUL_RETRY; + vio->req.state = STATE_IOREQ_NONE; + break; + } + + gprintk(XENLOG_WARNING, + "unable to fixup memory %s %#lx size %u: %d\n", + dir ? "read from" : "write to", addr, size, + inner_rc); + } + else + gdprintk(XENLOG_DEBUG, + "unhandled memory %s %#lx size %u\n", + dir ? "read from" : "write to", addr, size); + } rc = hvm_process_io_intercept(&null_handler, &p); vio->req.state = STATE_IOREQ_NONE; } --- a/xen/arch/x86/include/asm/hvm/emulate.h +++ b/xen/arch/x86/include/asm/hvm/emulate.h @@ -148,6 +148,9 @@ static inline void hvmemul_write_cache(c void hvm_dump_emulation_state(const char *loglvl, const char *prefix, struct hvm_emulate_ctxt *hvmemul_ctxt, int rc); +/* For PVH dom0: signal whether to attempt fixup of p2m page-faults. */ +extern bool opt_dom0_pf_fixup; + #endif /* __ASM_X86_HVM_EMULATE_H__ */ /* ++++++ 67c818d6-x86-PVH-dom0-correct-iomem_caps-bound.patch ++++++ # Commit a00e08799cc7657d2a1aca158f4ad43d4c9103e7 # Date 2025-03-05 10:26:46 +0100 # Author Roger Pau Monne <roger....@citrix.com> # Committer Roger Pau Monne <roger....@citrix.com> x86/dom0: correctly set the maximum ->iomem_caps bound for PVH The logic in dom0_setup_permissions() sets the maximum bound in ->iomem_caps unconditionally using paddr_bits, which is not correct for HVM based domains. Instead use domain_max_paddr_bits() to get the correct maximum paddr bits for each possible domain type. Switch to using PFN_DOWN() instead of PAGE_SHIFT, as that's shorter. Fixes: 53de839fb409 ('x86: constrain MFN range Dom0 may access') Signed-off-by: Roger Pau Monné <roger....@citrix.com> Reviewed-by: Jan Beulich <jbeul...@suse.com> --- a/xen/arch/x86/dom0_build.c +++ b/xen/arch/x86/dom0_build.c @@ -481,7 +481,8 @@ int __init dom0_setup_permissions(struct /* The hardware domain is initially permitted full I/O capabilities. */ rc = ioports_permit_access(d, 0, 0xFFFF); - rc |= iomem_permit_access(d, 0UL, (1UL << (paddr_bits - PAGE_SHIFT)) - 1); + rc |= iomem_permit_access(d, 0UL, + PFN_DOWN(1UL << domain_max_paddr_bits(d)) - 1); rc |= irqs_permit_access(d, 1, nr_irqs_gsi - 1); /* Modify I/O port access permissions. */ ++++++ 67c818d7-x86-IOMMU-account-for-IOMEM-caps-when-populating.patch ++++++ # Commit 62f3fc5296c452285e81adb50976bde2d68d3181 # Date 2025-03-05 10:26:46 +0100 # Author Roger Pau Monne <roger....@citrix.com> # Committer Roger Pau Monne <roger....@citrix.com> x86/iommu: account for IOMEM caps when populating dom0 IOMMU page-tables The current code in arch_iommu_hwdom_init() kind of open-codes the same MMIO permission ranges that are added to the hardware domain ->iomem_caps. Avoid this duplication and use ->iomem_caps in arch_iommu_hwdom_init() to filter which memory regions should be added to the dom0 IOMMU page-tables. Note the IO-APIC and MCFG page(s) must be set as not accessible for a PVH dom0, otherwise the internal Xen emulation for those ranges won't work. This requires adjustments in dom0_setup_permissions(). The call to pvh_setup_mmcfg() in dom0_construct_pvh() must now strictly be done ahead of setting up dom0 permissions, so take the opportunity to also put it inside the existing is_hardware_domain() region. Also the special casing of E820_UNUSABLE regions no longer needs to be done in arch_iommu_hwdom_init(), as those regions are already blocked in ->iomem_caps and thus would be removed from the rangeset as part of ->iomem_caps processing in arch_iommu_hwdom_init(). The E820_UNUSABLE regions below 1Mb are not removed from ->iomem_caps, that's a slight difference for the IOMMU created page-tables, but the aim is to allow access to the same memory either from the CPU or the IOMMU page-tables. Since ->iomem_caps already takes into account the domain max paddr, there's no need to remove any regions past the last address addressable by the domain, as applying ->iomem_caps would have already taken care of that. Suggested-by: Jan Beulich <jbeul...@suse.com> Signed-off-by: Roger Pau Monné <roger....@citrix.com> Reviewed-by: Jan Beulich <jbeul...@suse.com> --- a/xen/arch/x86/dom0_build.c +++ b/xen/arch/x86/dom0_build.c @@ -558,7 +558,9 @@ int __init dom0_setup_permissions(struct for ( i = 0; i < nr_ioapics; i++ ) { mfn = paddr_to_pfn(mp_ioapics[i].mpc_apicaddr); - if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) ) + /* If emulating IO-APIC(s) make sure the base address is unmapped. */ + if ( has_vioapic(d) || + !rangeset_contains_singleton(mmio_ro_ranges, mfn) ) rc |= iomem_deny_access(d, mfn, mfn); } /* MSI range. */ @@ -599,6 +601,13 @@ int __init dom0_setup_permissions(struct rc |= rangeset_add_singleton(mmio_ro_ranges, mfn); } + if ( has_vpci(d) ) + /* + * TODO: runtime added MMCFG regions are not checked to make sure they + * don't overlap with already mapped regions, thus preventing trapping. + */ + rc |= vpci_mmcfg_deny_access(d); + return rc; } --- a/xen/arch/x86/hvm/dom0_build.c +++ b/xen/arch/x86/hvm/dom0_build.c @@ -1324,6 +1324,13 @@ int __init dom0_construct_pvh(struct boo if ( is_hardware_domain(d) ) { /* + * MMCFG initialization must be performed before setting domain + * permissions, as the MCFG areas must not be part of the domain IOMEM + * accessible regions. + */ + pvh_setup_mmcfg(d); + + /* * Setup permissions early so that calls to add MMIO regions to the * p2m as part of vPCI setup don't fail due to permission checks. */ @@ -1336,13 +1343,6 @@ int __init dom0_construct_pvh(struct boo } /* - * NB: MMCFG initialization needs to be performed before iommu - * initialization so the iommu code can fetch the MMCFG regions used by the - * domain. - */ - pvh_setup_mmcfg(d); - - /* * Craft dom0 physical memory map and set the paging allocation. This must * be done before the iommu initializion, since iommu initialization code * will likely add mappings required by devices to the p2m (ie: RMRRs). --- a/xen/arch/x86/hvm/io.c +++ b/xen/arch/x86/hvm/io.c @@ -363,14 +363,14 @@ static const struct hvm_mmcfg *vpci_mmcf return NULL; } -int __hwdom_init vpci_subtract_mmcfg(const struct domain *d, struct rangeset *r) +int __hwdom_init vpci_mmcfg_deny_access(struct domain *d) { const struct hvm_mmcfg *mmcfg; list_for_each_entry ( mmcfg, &d->arch.hvm.mmcfg_regions, next ) { - int rc = rangeset_remove_range(r, PFN_DOWN(mmcfg->addr), - PFN_DOWN(mmcfg->addr + mmcfg->size - 1)); + int rc = iomem_deny_access(d, PFN_DOWN(mmcfg->addr), + PFN_DOWN(mmcfg->addr + mmcfg->size - 1)); if ( rc ) return rc; --- a/xen/arch/x86/include/asm/hvm/io.h +++ b/xen/arch/x86/include/asm/hvm/io.h @@ -132,8 +132,8 @@ int register_vpci_mmcfg_handler(struct d /* Destroy tracked MMCFG areas. */ void destroy_vpci_mmcfg(struct domain *d); -/* Remove MMCFG regions from a given rangeset. */ -int vpci_subtract_mmcfg(const struct domain *d, struct rangeset *r); +/* Remove MMCFG regions from a domain ->iomem_caps. */ +int vpci_mmcfg_deny_access(struct domain *d); #endif /* __ASM_X86_HVM_IO_H__ */ --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -320,6 +320,26 @@ static int __hwdom_init cf_check map_sub return rangeset_remove_range(map, s, e); } +struct handle_iomemcap { + struct rangeset *r; + unsigned long last; +}; +static int __hwdom_init cf_check map_subtract_iomemcap(unsigned long s, + unsigned long e, + void *data) +{ + struct handle_iomemcap *h = data; + int rc = 0; + + if ( h->last != s ) + rc = rangeset_remove_range(h->r, h->last, s - 1); + + ASSERT(e < ~0UL); + h->last = e + 1; + + return rc; +} + struct map_data { struct domain *d; unsigned int flush_flags; @@ -400,6 +420,7 @@ void __hwdom_init arch_iommu_hwdom_init( unsigned int i; struct rangeset *map; struct map_data map_data = { .d = d }; + struct handle_iomemcap iomem = {}; int rc; BUG_ON(!is_hardware_domain(d)); @@ -442,14 +463,6 @@ void __hwdom_init arch_iommu_hwdom_init( switch ( entry.type ) { - case E820_UNUSABLE: - /* Only relevant for inclusive mode, otherwise this is a no-op. */ - rc = rangeset_remove_range(map, PFN_DOWN(entry.addr), - PFN_DOWN(entry.addr + entry.size - 1)); - if ( rc ) - panic("IOMMU failed to remove unusable memory: %d\n", rc); - continue; - case E820_RESERVED: if ( !iommu_hwdom_inclusive && !iommu_hwdom_reserved ) continue; @@ -475,22 +488,13 @@ void __hwdom_init arch_iommu_hwdom_init( if ( rc ) panic("IOMMU failed to remove Xen ranges: %d\n", rc); - /* Remove any overlap with the Interrupt Address Range. */ - rc = rangeset_remove_range(map, 0xfee00, 0xfeeff); + iomem.r = map; + rc = rangeset_report_ranges(d->iomem_caps, 0, ~0UL, map_subtract_iomemcap, + &iomem); + if ( !rc && iomem.last < ~0UL ) + rc = rangeset_remove_range(map, iomem.last, ~0UL); if ( rc ) - panic("IOMMU failed to remove Interrupt Address Range: %d\n", rc); - - /* If emulating IO-APIC(s) make sure the base address is unmapped. */ - if ( has_vioapic(d) ) - { - for ( i = 0; i < d->arch.hvm.nr_vioapics; i++ ) - { - rc = rangeset_remove_singleton(map, - PFN_DOWN(domain_vioapic(d, i)->base_address)); - if ( rc ) - panic("IOMMU failed to remove IO-APIC: %d\n", rc); - } - } + panic("IOMMU failed to remove forbidden regions: %d\n", rc); if ( is_pv_domain(d) ) { @@ -506,23 +510,6 @@ void __hwdom_init arch_iommu_hwdom_init( panic("IOMMU failed to remove read-only regions: %d\n", rc); } - if ( has_vpci(d) ) - { - /* - * TODO: runtime added MMCFG regions are not checked to make sure they - * don't overlap with already mapped regions, thus preventing trapping. - */ - rc = vpci_subtract_mmcfg(d, map); - if ( rc ) - panic("IOMMU unable to remove MMCFG areas: %d\n", rc); - } - - /* Remove any regions past the last address addressable by the domain. */ - rc = rangeset_remove_range(map, PFN_DOWN(1UL << domain_max_paddr_bits(d)), - ~0UL); - if ( rc ) - panic("IOMMU unable to remove unaddressable ranges: %d\n", rc); - if ( iommu_verbose ) printk(XENLOG_INFO "%pd: identity mappings for IOMMU:\n", d); ++++++ 67c818d8-x86-Dom0-relax-Interrupt-Address-Range.patch ++++++ References: bsc#1227301 # Commit 381caa38850771ae218eb6f6d490dc02e40df964 # Date 2025-03-05 10:26:46 +0100 # Author Roger Pau Monne <roger....@citrix.com> # Committer Roger Pau Monne <roger....@citrix.com> x86/dom0: be less restrictive with the Interrupt Address Range Xen currently prevents dom0 from creating CPU or IOMMU page-table mappings into the interrupt address range [0xfee00000, 0xfeefffff]. This range has two different purposes. For accesses from the CPU is contains the default position of local APIC page at 0xfee00000. For accesses from devices it's the MSI address range, so the address field in the MSI entries (usually) point to an address on that range to trigger an interrupt. There are reports of Lenovo Thinkpad devices placing what seems to be the UCSI shared mailbox at address 0xfeec2000 in the interrupt address range. Attempting to use that device with a Linux PV dom0 leads to an error when Linux kernel maps 0xfeec2000: RIP: e030:xen_mc_flush+0x1e8/0x2b0 xen_leave_lazy_mmu+0x15/0x60 vmap_range_noflush+0x408/0x6f0 __ioremap_caller+0x20d/0x350 acpi_os_map_iomem+0x1a3/0x1c0 acpi_ex_system_memory_space_handler+0x229/0x3f0 acpi_ev_address_space_dispatch+0x17e/0x4c0 acpi_ex_access_region+0x28a/0x510 acpi_ex_field_datum_io+0x95/0x5c0 acpi_ex_extract_from_field+0x36b/0x4e0 acpi_ex_read_data_from_field+0xcb/0x430 acpi_ex_resolve_node_to_value+0x2e0/0x530 acpi_ex_resolve_to_value+0x1e7/0x550 acpi_ds_evaluate_name_path+0x107/0x170 acpi_ds_exec_end_op+0x392/0x860 acpi_ps_parse_loop+0x268/0xa30 acpi_ps_parse_aml+0x221/0x5e0 acpi_ps_execute_method+0x171/0x3e0 acpi_ns_evaluate+0x174/0x5d0 acpi_evaluate_object+0x167/0x440 acpi_evaluate_dsm+0xb6/0x130 ucsi_acpi_dsm+0x53/0x80 ucsi_acpi_read+0x2e/0x60 ucsi_register+0x24/0xa0 ucsi_acpi_probe+0x162/0x1e3 platform_probe+0x48/0x90 really_probe+0xde/0x340 __driver_probe_device+0x78/0x110 driver_probe_device+0x1f/0x90 __driver_attach+0xd2/0x1c0 bus_for_each_dev+0x77/0xc0 bus_add_driver+0x112/0x1f0 driver_register+0x72/0xd0 do_one_initcall+0x48/0x300 do_init_module+0x60/0x220 __do_sys_init_module+0x17f/0x1b0 do_syscall_64+0x82/0x170 Remove the restrictions to create mappings in the interrupt address range for dom0. Note that the restriction to map the local APIC page is enforced separately, and that continues to be present. Additionally make sure the emulated local APIC page is also not mapped, in case dom0 is using it. Note that even if the interrupt address range entries are populated in the IOMMU page-tables no device access will reach those pages. Device accesses to the Interrupt Address Range will always be converted into Interrupt Messages and are not subject to DMA remapping. There's also the following restriction noted in Intel VT-d: > Software must not program paging-structure entries to remap any address to > the interrupt address range. Untranslated requests and translation requests > that result in an address in the interrupt range will be blocked with > condition code LGN.4 or SGN.8. Translated requests with an address in the > interrupt address range are treated as Unsupported Request (UR). Similarly for AMD-Vi: > Accesses to the interrupt address range (Table 3) are defined to go through > the interrupt remapping portion of the IOMMU and not through address > translation processing. Therefore, when a transaction is being processed as > an interrupt remapping operation, the transaction attribute of > pretranslated or untranslated is ignored. > > Software Note: The IOMMU should > not be configured such that an address translation results in a special > address such as the interrupt address range. However those restrictions don't apply to the identity mappings possibly created for dom0, since the interrupt address range is never subject to DMA remapping, and hence there's no output address after translation that belongs to the interrupt address range. Reported-by: Jürgen Groà <jgr...@suse.com> Link: https://lore.kernel.org/xen-devel/baade0a7-e204-4743-bda1-282df74e5...@suse.com/ Signed-off-by: Roger Pau Monné <roger....@citrix.com> Acked-by: Jan Beulich <jbeul...@suse.com> --- a/xen/arch/x86/dom0_build.c +++ b/xen/arch/x86/dom0_build.c @@ -554,6 +554,13 @@ int __init dom0_setup_permissions(struct mfn = paddr_to_pfn(mp_lapic_addr); rc |= iomem_deny_access(d, mfn, mfn); } + /* If using an emulated local APIC make sure its MMIO is unpopulated. */ + if ( has_vlapic(d) ) + { + /* Xen doesn't allow changing the local APIC MMIO window position. */ + mfn = paddr_to_pfn(APIC_DEFAULT_PHYS_BASE); + rc |= iomem_deny_access(d, mfn, mfn); + } /* I/O APICs. */ for ( i = 0; i < nr_ioapics; i++ ) { @@ -563,10 +570,6 @@ int __init dom0_setup_permissions(struct !rangeset_contains_singleton(mmio_ro_ranges, mfn) ) rc |= iomem_deny_access(d, mfn, mfn); } - /* MSI range. */ - rc |= iomem_deny_access(d, paddr_to_pfn(MSI_ADDR_BASE_LO), - paddr_to_pfn(MSI_ADDR_BASE_LO + - MSI_ADDR_DEST_ID_MASK)); /* HyperTransport range. */ if ( boot_cpu_data.x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON) ) { ++++++ 67c86fc1-xl-fix-channel-configuration-setting.patch ++++++ References: bsc#1219354 # Commit e1ccced4afe465d6541c5825a0f8d1b8f5fa4253 # Date 2025-03-05 16:37:37 +0100 # Author Juergen Gross <jgr...@suse.com> # Committer Jan Beulich <jbeul...@suse.com> tools/xl: fix channel configuration setting Channels work differently than other device types: their devid should be -1 initially in order to distinguish them from the primary console which has the devid of 0. So when parsing the channel configuration, use ARRAY_EXTEND_INIT_NODEVID() in order to avoid overwriting the devid set by libxl_device_channel_init(). Fixes: 3a6679634766 ("libxl: set channel devid when not provided by application") Signed-off-by: Juergen Gross <jgr...@suse.com> Reviewed-by: Anthony PERARD <anthony.per...@vates.tech> --- a/tools/xl/xl_parse.c +++ b/tools/xl/xl_parse.c @@ -2423,8 +2423,9 @@ void parse_config_data(const char *confi char *path = NULL; int len; - chn = ARRAY_EXTEND_INIT(d_config->channels, d_config->num_channels, - libxl_device_channel_init); + chn = ARRAY_EXTEND_INIT_NODEVID(d_config->channels, + d_config->num_channels, + libxl_device_channel_init); split_string_into_string_list(buf, ",", &pairs); len = libxl_string_list_length(&pairs); ++++++ 67cb03e0-x86-vlapic-ESR-write-handling.patch ++++++ # Commit b28b590d4a23894672f1dd7fb98cdf9926ecb282 # Date 2025-03-07 14:34:08 +0000 # Author Andrew Cooper <andrew.coop...@citrix.com> # Committer Andrew Cooper <andrew.coop...@citrix.com> x86/vlapic: Fix handling of writes to APIC_ESR Xen currently presents APIC_ESR to guests as a simple read/write register. This is incorrect. The SDM states: The ESR is a write/read register. Before attempt to read from the ESR, software should first write to it. (The value written does not affect the values read subsequently; only zero may be written in x2APIC mode.) This write clears any previously logged errors and updates the ESR with any errors detected since the last write to the ESR. Introduce a new pending_esr field in hvm_hw_lapic. Update vlapic_error() to accumulate errors here, and extend vlapic_reg_write() to discard the written value and transfer pending_esr into APIC_ESR. Reads are still as before. Importantly, this means that guests no longer destroys the ESR value it's looking for in the LVTERR handler when following the SDM instructions. Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com> Reviewed-by: Jan Beulich <jbeul...@suse.com> --- a/xen/arch/x86/hvm/vlapic.c +++ b/xen/arch/x86/hvm/vlapic.c @@ -108,7 +108,7 @@ static void vlapic_error(struct vlapic * uint32_t esr; spin_lock_irqsave(&vlapic->esr_lock, flags); - esr = vlapic_get_reg(vlapic, APIC_ESR); + esr = vlapic->hw.pending_esr; if ( (esr & errmask) != errmask ) { uint32_t lvterr = vlapic_get_reg(vlapic, APIC_LVTERR); @@ -127,7 +127,7 @@ static void vlapic_error(struct vlapic * errmask |= APIC_ESR_RECVILL; } - vlapic_set_reg(vlapic, APIC_ESR, esr | errmask); + vlapic->hw.pending_esr |= errmask; if ( inj ) vlapic_set_irq(vlapic, lvterr & APIC_VECTOR_MASK, 0); @@ -802,6 +802,19 @@ void vlapic_reg_write(struct vcpu *v, un vlapic_set_reg(vlapic, APIC_ID, val); break; + case APIC_ESR: + { + unsigned long flags; + + spin_lock_irqsave(&vlapic->esr_lock, flags); + val = vlapic->hw.pending_esr; + vlapic->hw.pending_esr = 0; + spin_unlock_irqrestore(&vlapic->esr_lock, flags); + + vlapic_set_reg(vlapic, APIC_ESR, val); + break; + } + case APIC_TASKPRI: vlapic_set_reg(vlapic, APIC_TASKPRI, val & 0xff); break; --- a/xen/include/public/arch-x86/hvm/save.h +++ b/xen/include/public/arch-x86/hvm/save.h @@ -394,6 +394,7 @@ struct hvm_hw_lapic { uint32_t disabled; /* VLAPIC_xx_DISABLED */ uint32_t timer_divisor; uint64_t tdt_msr; + uint32_t pending_esr; }; DECLARE_HVM_SAVE_TYPE(LAPIC, 5, struct hvm_hw_lapic); ++++++ 67d17edd-x86-expose-MSR_FAM10H_MMIO_CONF_BASE-on-AMD.patch ++++++ # Commit b4071d28c5bd9ca4fed76031cbf0e782b74209b9 # Date 2025-03-12 13:32:30 +0100 # Author Roger Pau Monne <roger....@citrix.com> # Committer Roger Pau Monne <roger....@citrix.com> x86/msr: expose MSR_FAM10H_MMIO_CONF_BASE on AMD The MMIO_CONF_BASE reports the base of the MCFG range on AMD systems. Linux pre-6.14 is unconditionally attempting to read the MSR without a safe MSR accessor, and since Xen doesn't allow access to it Linux reports the following error: unchecked MSR access error: RDMSR from 0xc0010058 at rIP: 0xffffffff8101d19f (xen_do_read_msr+0x7f/0xa0) Call Trace: xen_read_msr+0x1e/0x30 amd_get_mmconfig_range+0x2b/0x80 quirk_amd_mmconfig_area+0x28/0x100 pnp_fixup_device+0x39/0x50 __pnp_add_device+0xf/0x150 pnp_add_device+0x3d/0x100 pnpacpi_add_device_handler+0x1f9/0x280 acpi_ns_get_device_callback+0x104/0x1c0 acpi_ns_walk_namespace+0x1d0/0x260 acpi_get_devices+0x8a/0xb0 pnpacpi_init+0x50/0x80 do_one_initcall+0x46/0x2e0 kernel_init_freeable+0x1da/0x2f0 kernel_init+0x16/0x1b0 ret_from_fork+0x30/0x50 ret_from_fork_asm+0x1b/0x30 Such access is conditional to the presence of a device with PnP ID "PNP0c01", which triggers the execution of the quirk_amd_mmconfig_area() function. Note that prior to commit 3fac3734c43a MSR accesses when running as a PV guest would always use the safe variant, and thus silently handle the #GP. Fix by allowing access to the MSR on AMD systems for the hardware domain. Write attempts to the MSR will still result in #GP for all domain types. Signed-off-by: Roger Pau Monné <roger....@citrix.com> Reviewed-by: Jan Beulich <jbeul...@suse.com> --- a/xen/arch/x86/msr.c +++ b/xen/arch/x86/msr.c @@ -245,6 +245,14 @@ int guest_rdmsr(struct vcpu *v, uint32_t *val = 0; break; + case MSR_FAM10H_MMIO_CONF_BASE: + if ( !is_hardware_domain(d) || + !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) || + rdmsr_safe(msr, *val) ) + goto gp_fault; + + break; + case MSR_VIRT_SPEC_CTRL: if ( !cp->extd.virt_ssbd ) goto gp_fault; ++++++ 67d17ede-VT-x-PI-usage-of-msi_desc-msg-field.patch ++++++ # Commit 30f0e55a79206702b4e82e86dad6b35033157858 # Date 2025-03-12 13:32:30 +0100 # Author Roger Pau Monne <roger....@citrix.com> # Committer Roger Pau Monne <roger....@citrix.com> x86/vmx: fix posted interrupts usage of msi_desc->msg field The current usage of msi_desc->msg in vmx_pi_update_irte() will make the field contain a translated MSI message, instead of the expected untranslated one. This breaks dump_msi(), that use the data in msi_desc->msg to print the interrupt details. Fix this by introducing a dummy local msi_msg, and use it with iommu_update_ire_from_msi(). vmx_pi_update_irte() relies on the MSI message not changing, so there's no need to propagate the resulting msi_msg to the hardware, and the contents can be ignored. Additionally add a comment to clarify that msi_desc->msg must always contain the untranslated MSI message. Fixes: a5e25908d18d ('VT-d: introduce new fields in msi_desc to track binding with guest interrupt') Signed-off-by: Roger Pau Monné <roger....@citrix.com> Reviewed-by: Jan Beulich <jbeul...@suse.com> --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -396,6 +396,7 @@ static int cf_check vmx_pi_update_irte(c const struct pi_desc *pi_desc = v ? &v->arch.hvm.vmx.pi_desc : NULL; struct irq_desc *desc; struct msi_desc *msi_desc; + struct msi_msg msg; int rc; desc = pirq_spin_lock_irq_desc(pirq, NULL); @@ -410,12 +411,13 @@ static int cf_check vmx_pi_update_irte(c } msi_desc->pi_desc = pi_desc; msi_desc->gvec = gvec; + msg = msi_desc->msg; spin_unlock_irq(&desc->lock); ASSERT_PDEV_LIST_IS_READ_LOCKED(msi_desc->dev->domain); - return iommu_update_ire_from_msi(msi_desc, &msi_desc->msg); + return iommu_update_ire_from_msi(msi_desc, &msg); unlock_out: spin_unlock_irq(&desc->lock); --- a/xen/arch/x86/include/asm/msi.h +++ b/xen/arch/x86/include/asm/msi.h @@ -124,7 +124,7 @@ struct msi_desc { int irq; int remap_index; /* index in interrupt remapping table */ - struct msi_msg msg; /* Last set MSI message */ + struct msi_msg msg; /* Last set MSI message (untranslated) */ }; /* ++++++ 67d2a3fe-libxl-avoid-infinite-loop-in-libxl__remove_directory.patch ++++++ References: bsc#1237692 # Commit 68baeb5c4852e652b9599e049f40477edac4060e # Date 2025-03-13 10:23:10 +0100 # Author Jan Beulich <jbeul...@suse.com> # Committer Jan Beulich <jbeul...@suse.com> libxl: avoid infinite loop in libxl__remove_directory() Infinitely retrying the rmdir() invocation makes little sense. While the original observation was the log filling the disk (due to repeated "Directory not empty" errors, in turn occurring for unclear reasons), the loop wants breaking even if there was no error message being logged (much like is done in the similar loops in libxl__remove_file() and libxl__remove_file_or_directory()). Fixes: c4dcbee67e6d ("libxl: provide libxl__remove_file et al") Signed-off-by: Jan Beulich <jbeul...@suse.com> Reviewed-by: Juergen Gross <jgr...@suse.com> Acked-by: Anthony PERARD <anthony.per...@vates.tech> --- a/tools/libs/light/libxl_utils.c +++ b/tools/libs/light/libxl_utils.c @@ -577,6 +577,7 @@ int libxl__remove_directory(libxl__gc *g if (errno == EINTR) continue; LOGE(ERROR, "failed to remove emptied directory %s", dirpath); rc = ERROR_FAIL; + break; } out: