From: dan.j.willi...@intel.com <dan.j.willi...@intel.com> Sent: Thursday, July 17, 2025 5:23 PM > > Michael Kelley wrote: > > From: dan.j.willi...@intel.com <dan.j.willi...@intel.com> Sent: Thursday, > > July 17, 2025 12:59 PM > > > > > > Michael Kelley wrote: > > > > From: Dan Williams <dan.j.willi...@intel.com> Sent: Wednesday, July 16, > > > > 2025 9:09 AM > > > > > > Thanks for taking a look Michael! > > > > > > [..] > > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > > > > index e9448d55113b..833ebf2d5213 100644 > > > > > --- a/drivers/pci/pci.c > > > > > +++ b/drivers/pci/pci.c > > > > > @@ -6692,9 +6692,50 @@ static void pci_no_domains(void) > > > > > #endif > > > > > } > > > > > > > > > > +#ifdef CONFIG_PCI_DOMAINS > > > > > +static DEFINE_IDA(pci_domain_nr_dynamic_ida); > > > > > + > > > > > +/* > > > > > + * Find a free domain_nr either allocated by > > > > > pci_domain_nr_dynamic_ida or > > > > > + * fallback to the first free domain number above the last ACPI > > > > > segment number. > > > > > + * Caller may have a specific domain number in mind, in which case > > > > > try to > > > > > + * reserve it. > > > > > + * > > > > > + * Note that this allocation is freed by > > > > > pci_release_host_bridge_dev(). > > > > > + */ > > > > > +int pci_bus_find_emul_domain_nr(int hint) > > > > > +{ > > > > > + if (hint >= 0) { > > > > > + hint = ida_alloc_range(&pci_domain_nr_dynamic_ida, > > > > > hint, hint, > > > > > + GFP_KERNEL); > > > > > > > > This almost preserves the existing functionality in pci-hyperv.c. But > > > > if the > > > > "hint" passed in is zero, current code in pci-hyperv.c treats that as a > > > > collision and allocates some other value. The special treatment of zero > > > > is > > > > necessary per the comment with the definition of HVPCI_DOM_INVALID. > > > > > > > > I don't have an opinion on whether the code here should treat a "hint" > > > > of zero as invalid, or whether that should be handled in pci-hyperv.c. > > > > > > Oh, I see what you are saying. I made the "hint == 0" case start working > > > where previously it should have failed. I feel like that's probably best > > > handled in pci-hyperv.c with something like the following which also > > > fixes up a regression I caused with @dom being unsigned: > > > > > > diff --git a/drivers/pci/controller/pci-hyperv.c > > > b/drivers/pci/controller/pci-hyperv.c > > > index cfe9806bdbe4..813757db98d1 100644 > > > --- a/drivers/pci/controller/pci-hyperv.c > > > +++ b/drivers/pci/controller/pci-hyperv.c > > > @@ -3642,9 +3642,9 @@ static int hv_pci_probe(struct hv_device *hdev, > > > { > > > struct pci_host_bridge *bridge; > > > struct hv_pcibus_device *hbus; > > > - u16 dom_req, dom; > > > + int ret, dom = -EINVAL; > > > + u16 dom_req; > > > char *name; > > > - int ret; > > > > > > bridge = devm_pci_alloc_host_bridge(&hdev->device, 0); > > > if (!bridge) > > > @@ -3673,7 +3673,8 @@ static int hv_pci_probe(struct hv_device *hdev, > > > * collisions) in the same VM. > > > */ > > > dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4]; > > > - dom = pci_bus_find_emul_domain_nr(dom_req); > > > + if (dom_req) > > > + dom = pci_bus_find_emul_domain_nr(dom_req); > > > > No, I don't think this is right either. If dom_req is 0, we don't want to > > hv_pci_probe() to fail. We want the "collision" path to be taken so that > > some other unused PCI domain ID is assigned. That could be done by > > passing -1 as the hint to pci_bus_bind_emul_domain_nr(). Or PCI > > domain ID 0 could be pre-reserved in init_hv_pci_drv() like is done > > with HVPCI_DOM_INVALID in current code. > > Yeah, I realized that shortly after sending. I will slow down. > > > > > > > A couple observations: > > > > > > - I think it would be reasonable to not fallback in the hint case with > > > something like this: > > > > We *do* need the fallback in the hint case. If the hint causes a collision > > (i.e., another device is already using the hinted PCI domain ID), then we > > need to choose some other PCI domain ID. Again, we don't want hv_pci_probe() > > to fail for the device because the value of bytes 4 and 5 chosen from > > device's > > GUID (as assigned by Hyper-V) accidently matches bytes 4 and 5 of some other > > device's GUID. Hyper-V guarantees the GUIDs are unique, but not bytes 4 and > > 5 standing alone. Current code behaves like the acpi_disabled case in your > > patch, and picks some other unused PCI domain ID in the 1 to 0xFFFF range. > > Ok, that feels like "let the caller set the range in addition to the > hint". > > > > > > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > > index 833ebf2d5213..0bd2053dbe8a 100644 > > > --- a/drivers/pci/pci.c > > > +++ b/drivers/pci/pci.c > > > @@ -6705,14 +6705,10 @@ static DEFINE_IDA(pci_domain_nr_dynamic_ida); > > > */ > > > int pci_bus_find_emul_domain_nr(int hint) > > > { > > > - if (hint >= 0) { > > > - hint = ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint, > > > + if (hint >= 0) > > > + return ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint, > > > GFP_KERNEL); > > > > > > - if (hint >= 0) > > > - return hint; > > > - } > > > - > > > if (acpi_disabled) > > > return ida_alloc(&pci_domain_nr_dynamic_ida, GFP_KERNEL); > > > > > > - The VMD driver has been allocating 32-bit PCI domain numbers since > > > v4.5 185a383ada2e ("x86/PCI: Add driver for Intel Volume Management > > > Device (VMD)"). At a minimum if it is still a problem, it is a shared > > > problem, but the significant deployment of VMD in the time likely > > > indicates it is ok. If not, the above change at least makes the > > > hyper-v case avoid 32-bit domain numbers. > > > > The problem we encountered in 2018/2019 was with graphics devices > > and the Xorg X Server, specifically with the PCI domain ID stored in > > xorg.conf to identify the graphics device that the X Server was to run > > against. I don't recall ever seeing a similar problem with storage or NIC > > devices, but my memory could be incomplete. It's plausible that user > > space code accessing the VMD device correctly handled 32-bit domain > > IDs, but that's not necessarily an indicator for user space graphics > > software. The Xorg X Server issues would have started somewhere after > > commit 4a9b0933bdfc in the 4.11 kernel, and were finally fixed in the 5.4 > > kernel with commits be700103efd10 and f73f8a504e279. > > > > All that said, I'm not personally averse to trying again in assigning a > > domain ID > 0xFFFF. I do see a commit [1] to fix libpciaccess that was > > made 7 years ago in response to the issues we were seeing on Hyper-V. > > Assuming those fixes have propagated into using packages like X Server, > > then we're good. But someone from Microsoft should probably sign off > > on taking this risk. I retired from Microsoft nearly two years ago, and > > meddle in things from time-to-time without the burden of dealing > > with customer support issues. ;-) > > Living the dream! Extra thanks for taking a look. > > > [1] > > https://gitlab.freedesktop.org/xorg/lib/libpciaccess/-/commit/a167bd6474522a709ff3cbb00476c0e4309cb66f > > > > Thanks for this. > > I would rather do the equivalent conversion for now because 7 years old > is right on the cusp of "someone might still be running that with new > kernels".
Works for me, and is a bit less risky. > > Here is the replacement fixup that I will fold in if it looks good to > you: > > -- 8< -- > diff --git a/drivers/pci/controller/pci-hyperv.c > b/drivers/pci/controller/pci-hyperv.c > index cfe9806bdbe4..f1079a438bff 100644 > --- a/drivers/pci/controller/pci-hyperv.c > +++ b/drivers/pci/controller/pci-hyperv.c > @@ -3642,9 +3642,9 @@ static int hv_pci_probe(struct hv_device *hdev, > { > struct pci_host_bridge *bridge; > struct hv_pcibus_device *hbus; > - u16 dom_req, dom; > + int ret, dom; > + u16 dom_req; > char *name; > - int ret; > > bridge = devm_pci_alloc_host_bridge(&hdev->device, 0); > if (!bridge) > @@ -3673,8 +3673,7 @@ static int hv_pci_probe(struct hv_device *hdev, > * collisions) in the same VM. > */ > dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4]; > - dom = pci_bus_find_emul_domain_nr(dom_req); > - As an additional paragraph the larger comment block above, let's include a massaged version of the comment associated with HVPCI_DOM_INVALID. Perhaps: * * Because Gen1 VMs use domain 0, don't allow picking domain 0 here, even * if bytes 4 and 5 of the instance GUID are both zero. */ > + dom = pci_bus_find_emul_domain_nr(dom_req, 1, U16_MAX); > if (dom < 0) { > dev_err(&hdev->device, > "Unable to use dom# 0x%x or other numbers", dom_req); > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c > index f60244ff9ef8..30935fe85af9 100644 > --- a/drivers/pci/controller/vmd.c > +++ b/drivers/pci/controller/vmd.c > @@ -881,7 +881,14 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > unsigned long features) > pci_add_resource_offset(&resources, &vmd->resources[2], offset[1]); > > sd->vmd_dev = vmd->dev; > - sd->domain = pci_bus_find_emul_domain_nr(PCI_DOMAIN_NR_NOT_SET); > + > + /* > + * Emulated domains start at 0x10000 to not clash with ACPI _SEG > + * domains. Per ACPI r6.0, sec 6.5.6, _SEG returns an integer, of > + * which the lower 16 bits are the PCI Segment Group (domain) number. > + * Other bits are currently reserved. > + */ > + sd->domain = pci_bus_find_emul_domain_nr(0, 0x10000, INT_MAX); > if (sd->domain < 0) > return sd->domain; > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 833ebf2d5213..de42e53f07d0 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -6695,34 +6695,15 @@ static void pci_no_domains(void) > #ifdef CONFIG_PCI_DOMAINS > static DEFINE_IDA(pci_domain_nr_dynamic_ida); > > -/* > - * Find a free domain_nr either allocated by pci_domain_nr_dynamic_ida or > - * fallback to the first free domain number above the last ACPI segment > number. > - * Caller may have a specific domain number in mind, in which case try to > - * reserve it. > - * > - * Note that this allocation is freed by pci_release_host_bridge_dev(). > +/** > + * pci_bus_find_emul_domain_nr() - allocate a PCI domain number per > constraints > + * @hint: desired domain, 0 if any id in the range of @min to @max is > acceptable > + * @min: minimum allowable domain > + * @max: maximum allowable domain, no ids higher than INT_MAX will be > returned > */ > -int pci_bus_find_emul_domain_nr(int hint) > +u32 pci_bus_find_emul_domain_nr(u32 hint, u32 min, u32 max) Shouldn't the return type here still be "int"? ida_alloc_range() can return a negative errno if it fails. And the call sites in hv_pci_probe() and vmd_enable_domain() store the return value into an "int". Other than that, and my suggested added comment, this looks good. Michael > { > - if (hint >= 0) { > - hint = ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint, > - GFP_KERNEL); > - > - if (hint >= 0) > - return hint; > - } > - > - if (acpi_disabled) > - return ida_alloc(&pci_domain_nr_dynamic_ida, GFP_KERNEL); > - > - /* > - * Emulated domains start at 0x10000 to not clash with ACPI _SEG > - * domains. Per ACPI r6.0, sec 6.5.6, _SEG returns an integer, of > - * which the lower 16 bits are the PCI Segment Group (domain) number. > - * Other bits are currently reserved. > - */ > - return ida_alloc_range(&pci_domain_nr_dynamic_ida, 0x10000, INT_MAX, > + return ida_alloc_range(&pci_domain_nr_dynamic_ida, max(hint, min), max, > GFP_KERNEL); > } > EXPORT_SYMBOL_GPL(pci_bus_find_emul_domain_nr); > diff --git a/include/linux/pci.h b/include/linux/pci.h > index f6a713da5c49..4aeabe8e2f1f 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -1934,13 +1934,16 @@ DEFINE_GUARD(pci_dev, struct pci_dev *, > pci_dev_lock(_T), pci_dev_unlock(_T)) > */ > #ifdef CONFIG_PCI_DOMAINS > extern int pci_domains_supported; > -int pci_bus_find_emul_domain_nr(int hint); > +u32 pci_bus_find_emul_domain_nr(u32 hint, u32 min, u32 max); > void pci_bus_release_emul_domain_nr(int domain_nr); > #else > enum { pci_domains_supported = 0 }; > static inline int pci_domain_nr(struct pci_bus *bus) { return 0; } > static inline int pci_proc_domain(struct pci_bus *bus) { return 0; } > -static inline int pci_bus_find_emul_domain_nr(int hint) { return 0; } > +static inline u32 pci_bus_find_emul_domain_nr(u32 hint, u32 min, u32 max) > +{ > + return 0; > +} > static inline void pci_bus_release_emul_domain_nr(int domain_nr) { } > #endif /* CONFIG_PCI_DOMAINS */ >