From: Easwar Hariharan <[email protected]> Sent: Tuesday, 
April 14, 2026 10:42 AM
>

[snip]
 
> >> Thanks for that explanation, that makes sense. I didn't see any 
> >> serialization
> >> that would ensure that the VMBus path to communicate the child devices on 
> >> the bus
> >> would complete before pci_scan_device() finds and finalizes the pci_dev. I 
> >> think it's
> >
> > FWIW, hv_pci_query_relations() should be ensuring that the communication
> > has completed before it returns. It does a wait_for_reponse(), which ensures
> > that the Hyper-V host has sent the PCI_BUS_RELATIONS[2] response. However,
> > that message spins off work to the hbus->wq workqueue, so
> > hv_pci_query_relations() has a flush_workqueue() so ensure everything that
> > was queued has completed.
> 
> Hm, I read the comment for the flush_workqueue() as addressing the 
> "PCI_BUS_RELATIONS[2]
> message arrived before we sent the QUERY_BUS_RELATIONS message" race case, 
> not as an
> "all child devices have definitely been received and processed in response to 
> our
> QUERY_BUS_RELATIONS message". Also, knowing very little about the VMBus 
> contract, I
> discounted the 100 ms timeout in wait_for_response() as a serialization 
> guarantee.

Yeah, that timeout is so that the code can wake up every 100 ms to check
if the device has been rescinded (i.e., removed). If the device isn't
rescinded, wait_for_response() waits forever until a response comes in.

> 
> Chalk it up to previous experience dealing with hardware that's *supposed* to 
> be
> spec-compliant and complete initialization within specified timings. :)
> 
> I see now that the flush is sufficient though.
> 
> >
> > Thinking more about the "hv_pcibus_installed" case, if that path is ever
> > triggered, I don't think anything needs to be done with the logical device 
> > ID.
> > The vPCI device has already been fully initialized on the Linux side, and 
> > it's
> > logical device ID would not change.
> >
> > So I think you could construct the full logical device ID once
> > hv_pci_query_relations() returns to hv_pci_probe().
> 
> Let me think about this more and decide between the logical ID and full bus 
> GUID
> options.
> 
> >
> >> safest to take the approach to communicate the GUID, and find the function 
> >> number from
> >> the pci_dev. This does mean that there will be an essentially identical 
> >> copy of
> >> hv_build_logical_dev_id() in the IOMMU code, but a comment can explain 
> >> that.
> >
> > With this alternative approach, is there a need to communicate the full
> > GUID to the pvIOMMU drvier? Couldn't you just communicate bytes 4 thru
> > 7, which would be logical device ID minus the function number?
> 
> Yes, we could just communicate bytes 4 through 7 but the pvIOMMU version of 
> the build logical
> ID function would diverge from the pci-hyperv version. I figured if we say 
> (in a comment)
> that this is the same ID as generated in pci-hyperv, it's better for future 
> readers to see it
> to be clearly identical at first glance.
> 
> It's also possible to change the pci-hyperv function to only take bytes 4 
> through 7 instead of the
> full GUID, but I rather think we don't need that impedance mismatch of bytes 
> 4 through 7 of the
> GUID becoming bytes 0 through 3 of a u32.
> 
> >
> >>
> >>>>
> >>>>>
> >>>>> So have the Hyper-V PV IOMMU driver provide an EXPORTed function to 
> >>>>> accept
> >>>>> a PCI domain ID and the related logical device ID. The PV IOMMU driver 
> >>>>> is
> >>>>> responsible for storing this data in a form that it can later search. 
> >>>>> hv_pci_probe()
> >>>>> calls this new function when it instantiates a new PCI pass-thru 
> >>>>> device. Then when
> >>>>> the IOMMU driver needs to attach a new device, it can get the PCI 
> >>>>> domain ID
> >>>>> from the struct pci_dev (or struct pci_bus), search for the related 
> >>>>> logical device
> >>>>> ID in its own data structure, and use it. The pci-hyperv driver has a 
> >>>>> dependency
> >>>>> on the IOMMU driver, but that's a dependency in the desired direction. 
> >>>>> The
> >>>>> PCI domain ID and logical device ID are just integers, so no data 
> >>>>> structures are
> >>>>> shared.
> >>>>
> >>>> In a previous reply on this thread, you raised the uniqueness issue of 
> >>>> bytes 4 and 5
> >>>> of the GUID being used to create the domain number. I thought this 
> >>>> approach could
> >>>> help with that too, but as I coded it up, I realized that using the 
> >>>> domain number
> >>>> (not guaranteed to be unique) to search for the bus instance GUID 
> >>>> (guaranteed to be unique)
> >>>> is the wrong way around. It is unfortunately the only available key in 
> >>>> the pci_dev
> >>>> handed to the pvIOMMU driver in this approach though...
> >>>>
> >>>> Do you think that's a fatal flaw?
> >>>
> >>> There are two uniqueness problems, which I didn't fully separate 
> >>> conceptually
> >>> until writing this. One problem is constructing a PCI domain ID that 
> >>> Linux can use
> >>> to identify the virtual PCI bus that the Hyper-V PCI driver creates for 
> >>> each vPCI
> >>> device. The Hyper-V virtual PCI driver uses GUID bytes 4 and 5, and 
> >>> recognizes
> >>> that they might not be unique. So there's code in hv_pci_probe() to pick 
> >>> another
> >>> number if there's a duplicate. Hyper-V doesn't really care how Linux 
> >>> picks the
> >>> domain ID for the virtual PCI bus as it's purely a Linux construct.
> >>
> >> This part matters for the IOMMU driver as it is the key we will use to 
> >> search the data
> >> structure to find the right GUID to construct the logical dev ID that 
> >> Hyper-V recognizes.
> >
> > Right. But the Hyper-V vPCI driver in Linux ensures that the domain ID is 
> > unique
> > in the sense that two active vPCI devices will not have the same domain ID. 
> > So
> > the pvIOMMU driver should not encounter any ambiguity when looking up the
> > logical device ID.
> 
> Agreed, that was a fragment of a thought that I neglected to delete before 
> sending.
> Apologies.
> 
> > As you noted below, it's possible that a vPCI device could go
> > away, and another vPCI device could be added that ends up with a domain ID
> > that was previously used. When that added vPCI device is setup by the 
> > Hyper-V
> > vPCI driver, it will inform the pvIOMMU driver about the domain ID -> 
> > logical
> > device ID mapping, and it might overwrite an existing mapping if the newly
> > added vPCI device ended up with a domain ID that had previously been used.
> > And that's fine.
> 
> Yes.
> 
> >>
> >>>
> >>> The second problem is the logical device ID that Hyper-V interprets to
> >>> identify a vPCI device in hypercalls such a HVCALL_RETARGET_INTERRUPT
> >>> and the new pvIOMMU related hypercalls. This logical device ID uses
> >>> GUID bytes 4 thru 7 (minus 1 bit).  I don’t think Linux uses the
> >>> logical device ID for anything. Since only Hyper-V interprets it, Hyper-V
> >>> must somehow be ensuring uniqueness of bytes 4 thru 7 (minus 1 bit).
> >>> That's something to confirm with the Hyper-V team. If they are just hoping
> >>> for the best, I don't know how Linux can solve the problem.
> >>
> >> I checked with the Hyper-V vPCI team on this aspect and the only guarantee 
> >> that
> >> they provide is that, at any given time, there will only be 1 device with 
> >> a given
> >> logical ID attached to a VM.
> >
> > OK, so Hyper-V is guaranteeing the uniqueness of vPCI device GUID bytes 4
> > thru 7 across all vPCI devices that are attached to a VM at a given point 
> > in time.
> > That's good!
> 
> Technically, they're guaranteeing only that the *combination* of GUID bytes 4 
> through 7 AND
> the slot number will be unique across all vPCI devices that are attached to a 
> VM at a given
> point in time. As you say below, while we have in practice not seen multiple 
> devices on a
> vPCI bus, the vPCI team asserts that there is no restriction in the stack on 
> doing so.
> 

Agreed.

> >
> >> Once a device has been removed, everything about it is
> >> forgotten from the Hyper-V stack's perspective, and nothing in the Hyper-V 
> >> stack would
> >> prevent a scenario where, for example, a data movement accelerator is 
> >> attached with
> >> logical ID X, then revoked, and let's say a NIC is attached with the same 
> >> logical ID X.
> >
> > And the "forgetting" behavior is the same in Linux. Once the device is 
> > removed,
> > Linux forgets everything about it. If a new vPCI device shows up and happens
> > to have the same GUID as a previous device, that should not cause any 
> > problems
> > in Linux.
> >
> >>
> >> Also, FWIW, they also stated that the GUID is not unique and cannot be
> >> guaranteed to be unique because it's the GUID for the bus, not the 
> >> individual
> >> devices.
> >
> > I'm not sure I understand this statement. Is this referring to the 
> > possibility
> > that a vPCI "device" that Hyper-V offers to the guest might have multiple
> > functions?
> 
> Yes, apologies for the vagueness.
> 
> > The vPCI device driver in Linux has code to recognize this case,
> > but I'm not aware of any current cases where it happens. In such a case,
> > Linux should create a single PCI bus abstraction with multiple devices
> > attached to it, with each device being a different function. If Hyper-V
> > did ever offer a multiple-function configuration, there might be some
> > debugging to do in the Hyper-V vPCI driver in Linux!
> >
> > We shortcut the terminology by referring to a vPCI "device", and assuming
> > that devices and busses are 1-to-1. But design allows for multiple devices
> > as different functions on the same bus.
> >
> >>
> 
> <snip>
> 
> >>>>>
> >>>>> I don't think the pci-hyperv driver even needs to tell the IOMMU driver 
> >>>>> to
> >>>>> remove the information if a PCI pass-thru device is unbound or removed, 
> >>>>> as
> >>>>> the logical device ID will be the same if the device ever comes back. 
> >>>>> At worst,
> >>>>> the IOMMU driver can simply replace an existing logical device ID if a 
> >>>>> new one
> >>>>> is provided for the same PCI domain ID.
> >>>>
> >>>> As above, replacing a unique GUID when a result is found for a non-unique
> >>>> key value may be prone to failure if it happens that the device that 
> >>>> came "back"
> >>>> is not in fact the same device (or class of device) that went away and 
> >>>> just happens
> >>>> to, either due to bytes 4 and 5 being identical, or due to collision in 
> >>>> the
> >>>> pci_domain_nr_dynamic_ida, have the same domain number.
> >>
> >> Given the vPCI team's statements (above), I think we will need to handle 
> >> unbind or
> >> removal and ensure the pvIOMMU drivers data structure is invalidated when 
> >> either
> >> happens.
> >
> > The generic PCI code should handle detaching from the pvIOMMU. So I'm 
> > assuming
> > your statement is specifically about the mapping from domain ID to logical 
> > device ID.
> 
> Yes, apologies for the vagueness (again).
> 
> > I still think removing it may be unnecessary since adding a mapping for a 
> > new vPCI
> > device with the same domain ID but different logical device ID could just 
> > overwrite
> > any existing mapping. And leaving a dead mapping in the pvIOMMU data 
> > structures
> > doesn’t actually hurt anything. On the other hand, removing/invalidating it 
> > is
> > certainly more tidy and might prevent some confusion down the road.
> >
> 
> Yes, if the data structure maps domain -> logical ID, we can do the overwrite 
> as you say.
> With my approach of informing the pvIOMMU driver of the entire (bus) GUID, we 
> would want
> to be careful that we don't assume the 1:1 bus<->device case and overwrite an 
> existing
> device entry with a new device that's on the same bus.

Yes, that's a valid point.  I was assuming that the pvIOMMU would use the
domain ID at the lookup key, since the domain ID is directly available from the
struct pci_dev that is an input parameter to the IOMMU functions. But in the
not 1:1 case, that domain ID might refer to a bus with multiple functions. The
logical device IDs for those devices will be the same except for the low order
3 bits that encode with the function number. So maybe the domain ID maps
to a partial logical device ID, and the pvIOMMU driver must always add in the
function number so the not 1:1 case works.

Would the pvIOMMU driver do anything with the full GUID, except extract
bytes 4 through 7? There's no way I see to use the full GUID as the lookup
key.

Michael

Reply via email to