From: Easwar Hariharan <[email protected]> Sent: Tuesday, April 14, 2026 10:42 AM >
[snip] > >> Thanks for that explanation, that makes sense. I didn't see any > >> serialization > >> that would ensure that the VMBus path to communicate the child devices on > >> the bus > >> would complete before pci_scan_device() finds and finalizes the pci_dev. I > >> think it's > > > > FWIW, hv_pci_query_relations() should be ensuring that the communication > > has completed before it returns. It does a wait_for_reponse(), which ensures > > that the Hyper-V host has sent the PCI_BUS_RELATIONS[2] response. However, > > that message spins off work to the hbus->wq workqueue, so > > hv_pci_query_relations() has a flush_workqueue() so ensure everything that > > was queued has completed. > > Hm, I read the comment for the flush_workqueue() as addressing the > "PCI_BUS_RELATIONS[2] > message arrived before we sent the QUERY_BUS_RELATIONS message" race case, > not as an > "all child devices have definitely been received and processed in response to > our > QUERY_BUS_RELATIONS message". Also, knowing very little about the VMBus > contract, I > discounted the 100 ms timeout in wait_for_response() as a serialization > guarantee. Yeah, that timeout is so that the code can wake up every 100 ms to check if the device has been rescinded (i.e., removed). If the device isn't rescinded, wait_for_response() waits forever until a response comes in. > > Chalk it up to previous experience dealing with hardware that's *supposed* to > be > spec-compliant and complete initialization within specified timings. :) > > I see now that the flush is sufficient though. > > > > > Thinking more about the "hv_pcibus_installed" case, if that path is ever > > triggered, I don't think anything needs to be done with the logical device > > ID. > > The vPCI device has already been fully initialized on the Linux side, and > > it's > > logical device ID would not change. > > > > So I think you could construct the full logical device ID once > > hv_pci_query_relations() returns to hv_pci_probe(). > > Let me think about this more and decide between the logical ID and full bus > GUID > options. > > > > >> safest to take the approach to communicate the GUID, and find the function > >> number from > >> the pci_dev. This does mean that there will be an essentially identical > >> copy of > >> hv_build_logical_dev_id() in the IOMMU code, but a comment can explain > >> that. > > > > With this alternative approach, is there a need to communicate the full > > GUID to the pvIOMMU drvier? Couldn't you just communicate bytes 4 thru > > 7, which would be logical device ID minus the function number? > > Yes, we could just communicate bytes 4 through 7 but the pvIOMMU version of > the build logical > ID function would diverge from the pci-hyperv version. I figured if we say > (in a comment) > that this is the same ID as generated in pci-hyperv, it's better for future > readers to see it > to be clearly identical at first glance. > > It's also possible to change the pci-hyperv function to only take bytes 4 > through 7 instead of the > full GUID, but I rather think we don't need that impedance mismatch of bytes > 4 through 7 of the > GUID becoming bytes 0 through 3 of a u32. > > > > >> > >>>> > >>>>> > >>>>> So have the Hyper-V PV IOMMU driver provide an EXPORTed function to > >>>>> accept > >>>>> a PCI domain ID and the related logical device ID. The PV IOMMU driver > >>>>> is > >>>>> responsible for storing this data in a form that it can later search. > >>>>> hv_pci_probe() > >>>>> calls this new function when it instantiates a new PCI pass-thru > >>>>> device. Then when > >>>>> the IOMMU driver needs to attach a new device, it can get the PCI > >>>>> domain ID > >>>>> from the struct pci_dev (or struct pci_bus), search for the related > >>>>> logical device > >>>>> ID in its own data structure, and use it. The pci-hyperv driver has a > >>>>> dependency > >>>>> on the IOMMU driver, but that's a dependency in the desired direction. > >>>>> The > >>>>> PCI domain ID and logical device ID are just integers, so no data > >>>>> structures are > >>>>> shared. > >>>> > >>>> In a previous reply on this thread, you raised the uniqueness issue of > >>>> bytes 4 and 5 > >>>> of the GUID being used to create the domain number. I thought this > >>>> approach could > >>>> help with that too, but as I coded it up, I realized that using the > >>>> domain number > >>>> (not guaranteed to be unique) to search for the bus instance GUID > >>>> (guaranteed to be unique) > >>>> is the wrong way around. It is unfortunately the only available key in > >>>> the pci_dev > >>>> handed to the pvIOMMU driver in this approach though... > >>>> > >>>> Do you think that's a fatal flaw? > >>> > >>> There are two uniqueness problems, which I didn't fully separate > >>> conceptually > >>> until writing this. One problem is constructing a PCI domain ID that > >>> Linux can use > >>> to identify the virtual PCI bus that the Hyper-V PCI driver creates for > >>> each vPCI > >>> device. The Hyper-V virtual PCI driver uses GUID bytes 4 and 5, and > >>> recognizes > >>> that they might not be unique. So there's code in hv_pci_probe() to pick > >>> another > >>> number if there's a duplicate. Hyper-V doesn't really care how Linux > >>> picks the > >>> domain ID for the virtual PCI bus as it's purely a Linux construct. > >> > >> This part matters for the IOMMU driver as it is the key we will use to > >> search the data > >> structure to find the right GUID to construct the logical dev ID that > >> Hyper-V recognizes. > > > > Right. But the Hyper-V vPCI driver in Linux ensures that the domain ID is > > unique > > in the sense that two active vPCI devices will not have the same domain ID. > > So > > the pvIOMMU driver should not encounter any ambiguity when looking up the > > logical device ID. > > Agreed, that was a fragment of a thought that I neglected to delete before > sending. > Apologies. > > > As you noted below, it's possible that a vPCI device could go > > away, and another vPCI device could be added that ends up with a domain ID > > that was previously used. When that added vPCI device is setup by the > > Hyper-V > > vPCI driver, it will inform the pvIOMMU driver about the domain ID -> > > logical > > device ID mapping, and it might overwrite an existing mapping if the newly > > added vPCI device ended up with a domain ID that had previously been used. > > And that's fine. > > Yes. > > >> > >>> > >>> The second problem is the logical device ID that Hyper-V interprets to > >>> identify a vPCI device in hypercalls such a HVCALL_RETARGET_INTERRUPT > >>> and the new pvIOMMU related hypercalls. This logical device ID uses > >>> GUID bytes 4 thru 7 (minus 1 bit). I don’t think Linux uses the > >>> logical device ID for anything. Since only Hyper-V interprets it, Hyper-V > >>> must somehow be ensuring uniqueness of bytes 4 thru 7 (minus 1 bit). > >>> That's something to confirm with the Hyper-V team. If they are just hoping > >>> for the best, I don't know how Linux can solve the problem. > >> > >> I checked with the Hyper-V vPCI team on this aspect and the only guarantee > >> that > >> they provide is that, at any given time, there will only be 1 device with > >> a given > >> logical ID attached to a VM. > > > > OK, so Hyper-V is guaranteeing the uniqueness of vPCI device GUID bytes 4 > > thru 7 across all vPCI devices that are attached to a VM at a given point > > in time. > > That's good! > > Technically, they're guaranteeing only that the *combination* of GUID bytes 4 > through 7 AND > the slot number will be unique across all vPCI devices that are attached to a > VM at a given > point in time. As you say below, while we have in practice not seen multiple > devices on a > vPCI bus, the vPCI team asserts that there is no restriction in the stack on > doing so. > Agreed. > > > >> Once a device has been removed, everything about it is > >> forgotten from the Hyper-V stack's perspective, and nothing in the Hyper-V > >> stack would > >> prevent a scenario where, for example, a data movement accelerator is > >> attached with > >> logical ID X, then revoked, and let's say a NIC is attached with the same > >> logical ID X. > > > > And the "forgetting" behavior is the same in Linux. Once the device is > > removed, > > Linux forgets everything about it. If a new vPCI device shows up and happens > > to have the same GUID as a previous device, that should not cause any > > problems > > in Linux. > > > >> > >> Also, FWIW, they also stated that the GUID is not unique and cannot be > >> guaranteed to be unique because it's the GUID for the bus, not the > >> individual > >> devices. > > > > I'm not sure I understand this statement. Is this referring to the > > possibility > > that a vPCI "device" that Hyper-V offers to the guest might have multiple > > functions? > > Yes, apologies for the vagueness. > > > The vPCI device driver in Linux has code to recognize this case, > > but I'm not aware of any current cases where it happens. In such a case, > > Linux should create a single PCI bus abstraction with multiple devices > > attached to it, with each device being a different function. If Hyper-V > > did ever offer a multiple-function configuration, there might be some > > debugging to do in the Hyper-V vPCI driver in Linux! > > > > We shortcut the terminology by referring to a vPCI "device", and assuming > > that devices and busses are 1-to-1. But design allows for multiple devices > > as different functions on the same bus. > > > >> > > <snip> > > >>>>> > >>>>> I don't think the pci-hyperv driver even needs to tell the IOMMU driver > >>>>> to > >>>>> remove the information if a PCI pass-thru device is unbound or removed, > >>>>> as > >>>>> the logical device ID will be the same if the device ever comes back. > >>>>> At worst, > >>>>> the IOMMU driver can simply replace an existing logical device ID if a > >>>>> new one > >>>>> is provided for the same PCI domain ID. > >>>> > >>>> As above, replacing a unique GUID when a result is found for a non-unique > >>>> key value may be prone to failure if it happens that the device that > >>>> came "back" > >>>> is not in fact the same device (or class of device) that went away and > >>>> just happens > >>>> to, either due to bytes 4 and 5 being identical, or due to collision in > >>>> the > >>>> pci_domain_nr_dynamic_ida, have the same domain number. > >> > >> Given the vPCI team's statements (above), I think we will need to handle > >> unbind or > >> removal and ensure the pvIOMMU drivers data structure is invalidated when > >> either > >> happens. > > > > The generic PCI code should handle detaching from the pvIOMMU. So I'm > > assuming > > your statement is specifically about the mapping from domain ID to logical > > device ID. > > Yes, apologies for the vagueness (again). > > > I still think removing it may be unnecessary since adding a mapping for a > > new vPCI > > device with the same domain ID but different logical device ID could just > > overwrite > > any existing mapping. And leaving a dead mapping in the pvIOMMU data > > structures > > doesn’t actually hurt anything. On the other hand, removing/invalidating it > > is > > certainly more tidy and might prevent some confusion down the road. > > > > Yes, if the data structure maps domain -> logical ID, we can do the overwrite > as you say. > With my approach of informing the pvIOMMU driver of the entire (bus) GUID, we > would want > to be careful that we don't assume the 1:1 bus<->device case and overwrite an > existing > device entry with a new device that's on the same bus. Yes, that's a valid point. I was assuming that the pvIOMMU would use the domain ID at the lookup key, since the domain ID is directly available from the struct pci_dev that is an input parameter to the IOMMU functions. But in the not 1:1 case, that domain ID might refer to a bus with multiple functions. The logical device IDs for those devices will be the same except for the low order 3 bits that encode with the function number. So maybe the domain ID maps to a partial logical device ID, and the pvIOMMU driver must always add in the function number so the not 1:1 case works. Would the pvIOMMU driver do anything with the full GUID, except extract bytes 4 through 7? There's no way I see to use the full GUID as the lookup key. Michael

