On 4/14/2026 11:06 AM, Michael Kelley wrote: > From: Easwar Hariharan <[email protected]> Sent: Tuesday, > April 14, 2026 10:42 AM >> > > [snip] > >>>> Thanks for that explanation, that makes sense. I didn't see any >>>> serialization >>>> that would ensure that the VMBus path to communicate the child devices on >>>> the bus >>>> would complete before pci_scan_device() finds and finalizes the pci_dev. I >>>> think it's >>> >>> FWIW, hv_pci_query_relations() should be ensuring that the communication >>> has completed before it returns. It does a wait_for_reponse(), which ensures >>> that the Hyper-V host has sent the PCI_BUS_RELATIONS[2] response. However, >>> that message spins off work to the hbus->wq workqueue, so >>> hv_pci_query_relations() has a flush_workqueue() so ensure everything that >>> was queued has completed. >> >> Hm, I read the comment for the flush_workqueue() as addressing the >> "PCI_BUS_RELATIONS[2] >> message arrived before we sent the QUERY_BUS_RELATIONS message" race case, >> not as an >> "all child devices have definitely been received and processed in response >> to our >> QUERY_BUS_RELATIONS message". Also, knowing very little about the VMBus >> contract, I >> discounted the 100 ms timeout in wait_for_response() as a serialization >> guarantee. > > Yeah, that timeout is so that the code can wake up every 100 ms to check > if the device has been rescinded (i.e., removed). If the device isn't > rescinded, wait_for_response() waits forever until a response comes in.
I don't know how I missed that. :( >> >> Chalk it up to previous experience dealing with hardware that's *supposed* >> to be >> spec-compliant and complete initialization within specified timings. :) >> >> I see now that the flush is sufficient though. >> >>> >>> Thinking more about the "hv_pcibus_installed" case, if that path is ever >>> triggered, I don't think anything needs to be done with the logical device >>> ID. >>> The vPCI device has already been fully initialized on the Linux side, and >>> it's >>> logical device ID would not change. >>> >>> So I think you could construct the full logical device ID once >>> hv_pci_query_relations() returns to hv_pci_probe(). >> >> Let me think about this more and decide between the logical ID and full bus >> GUID >> options. >> >>> >>>> safest to take the approach to communicate the GUID, and find the function >>>> number from >>>> the pci_dev. This does mean that there will be an essentially identical >>>> copy of >>>> hv_build_logical_dev_id() in the IOMMU code, but a comment can explain >>>> that. >>> >>> With this alternative approach, is there a need to communicate the full >>> GUID to the pvIOMMU drvier? Couldn't you just communicate bytes 4 thru >>> 7, which would be logical device ID minus the function number? >> >> Yes, we could just communicate bytes 4 through 7 but the pvIOMMU version of >> the build logical >> ID function would diverge from the pci-hyperv version. I figured if we say >> (in a comment) >> that this is the same ID as generated in pci-hyperv, it's better for future >> readers to see it >> to be clearly identical at first glance. >> >> It's also possible to change the pci-hyperv function to only take bytes 4 >> through 7 instead of the >> full GUID, but I rather think we don't need that impedance mismatch of bytes >> 4 through 7 of the >> GUID becoming bytes 0 through 3 of a u32. >> >>> <snip> >> >>>>>>> >>>>>>> I don't think the pci-hyperv driver even needs to tell the IOMMU driver >>>>>>> to >>>>>>> remove the information if a PCI pass-thru device is unbound or removed, >>>>>>> as >>>>>>> the logical device ID will be the same if the device ever comes back. >>>>>>> At worst, >>>>>>> the IOMMU driver can simply replace an existing logical device ID if a >>>>>>> new one >>>>>>> is provided for the same PCI domain ID. >>>>>> >>>>>> As above, replacing a unique GUID when a result is found for a non-unique >>>>>> key value may be prone to failure if it happens that the device that >>>>>> came "back" >>>>>> is not in fact the same device (or class of device) that went away and >>>>>> just happens >>>>>> to, either due to bytes 4 and 5 being identical, or due to collision in >>>>>> the >>>>>> pci_domain_nr_dynamic_ida, have the same domain number. >>>> >>>> Given the vPCI team's statements (above), I think we will need to handle >>>> unbind or >>>> removal and ensure the pvIOMMU drivers data structure is invalidated when >>>> either >>>> happens. >>> >>> The generic PCI code should handle detaching from the pvIOMMU. So I'm >>> assuming >>> your statement is specifically about the mapping from domain ID to logical >>> device ID. >> >> Yes, apologies for the vagueness (again). >> >>> I still think removing it may be unnecessary since adding a mapping for a >>> new vPCI >>> device with the same domain ID but different logical device ID could just >>> overwrite >>> any existing mapping. And leaving a dead mapping in the pvIOMMU data >>> structures >>> doesn’t actually hurt anything. On the other hand, removing/invalidating it >>> is >>> certainly more tidy and might prevent some confusion down the road. >>> >> >> Yes, if the data structure maps domain -> logical ID, we can do the >> overwrite as you say. >> With my approach of informing the pvIOMMU driver of the entire (bus) GUID, >> we would want >> to be careful that we don't assume the 1:1 bus<->device case and overwrite >> an existing >> device entry with a new device that's on the same bus. > > Yes, that's a valid point. I was assuming that the pvIOMMU would use the > domain ID at the lookup key, since the domain ID is directly available from > the > struct pci_dev that is an input parameter to the IOMMU functions. But in the > not 1:1 case, that domain ID might refer to a bus with multiple functions. The > logical device IDs for those devices will be the same except for the low order > 3 bits that encode with the function number. So maybe the domain ID maps > to a partial logical device ID, and the pvIOMMU driver must always add in the > function number so the not 1:1 case works. Agreed. > > Would the pvIOMMU driver do anything with the full GUID, except extract > bytes 4 through 7? There's no way I see to use the full GUID as the lookup > key. > No, the hypercalls only use the logical ID, so the rest of the GUID is unused. Thanks, Easwar (he/him)

