On Tue, May 27, 2025 at 06:43:46PM -0700, Nathan Chen wrote: > Hi Daniel, > > On 5/20/2025 5:51 AM, Daniel P. Berrangé wrote: > > > Hi, > > > > > > This is a follow up to the first RFC patchset [0] for supporting multiple > > > vSMMU instances in a qemu VM. This patchset also introduces support for > > > using iommufd to propagate DMA mappings to kernel for assigned devices. > > > > > > This patchset implements support for specifying multiple <iommu> devices > > > within the VM definition when smmuv3Dev IOMMU model is specified, and is > > > tested with Shameer's latest qemu RFC for HW-accelerated vSMMU devices [1] > > > > > > Moreover, it adds a new 'iommufd' member for virDomainIOMMUDef, > > > in order to represent the iommufd object in qemu command line. This > > > patchset also implements new 'iommufdId' and 'iommufdFd' attributes for > > > hostdev devices to be associated with the iommufd object. > > > > > > For instance, specifying the iommufd object and associated hostdev in a > > > VM definition with multiple IOMMUs, configured to be routed to > > > pcie-expander-bus controllers in a way where VFIO device to SMMUv3 > > > associations are matched with the host (pcie-expander-bus and > > > pcie-root-port controllers are no longer auto-added/auto-routed > > > like in the first revision of this RFC, as the PCIe topology will be > > > configured by management apps): > > > > > > <devices> > > > ... > > > <controller type='pci' index='1' model='pcie-expander-bus'> > > > <model name='pxb-pcie'/> > > > <target busNr='252'/> > > > <address type='pci' domain='0x0000' bus='0x00' slot='0x01' > > > function='0x0'/> > > > </controller> > > > <controller type='pci' index='2' model='pcie-expander-bus'> > > > <model name='pxb-pcie'/> > > > <target busNr='248'/> > > > <address type='pci' domain='0x0000' bus='0x00' slot='0x02' > > > function='0x0'/> > > > </controller> > > > ... > > > <controller type='pci' index='21' model='pcie-root-port'> > > > <model name='pcie-root-port'/> > > > <target chassis='21' port='0x0'/> > > > <address type='pci' domain='0x0000' bus='0x01' slot='0x00' > > > function='0x0'/> > > > </controller> > > > <controller type='pci' index='22' model='pcie-root-port'> > > > <model name='pcie-root-port'/> > > > <target chassis='22' port='0xa8'/> > > > <address type='pci' domain='0x0000' bus='0x02' slot='0x00' > > > function='0x0'/> > > > </controller> > > > ... > > > <hostdev mode='subsystem' type='pci' managed='no'> > > > <source> > > > <address domain='0x0009' bus='0x01' slot='0x00' function='0x0'/> > > > </source> > > > <iommufdId>iommufd0</iommufdId> > > > <address type='pci' domain='0x0000' bus='0x15' slot='0x00' > > > function='0x0'/> > > > </hostdev> > > > <hostdev mode='subsystem' type='pci' managed='no'> > > > <source> > > > <address domain='0x0019' bus='0x01' slot='0x00' function='0x0'/> > > > </source> > > > <iommufdId>iommufd0</iommufdId> > > > <address type='pci' domain='0x0000' bus='0x16' slot='0x00' > > > function='0x0'/> > > > </hostdev> > > > <iommu model='smmuv3Dev'> > > > <iommufd> > > > <id>iommufd0</id> > > > </iommufd> > > > <address type='pci' domain='0x0000' bus='0x01' slot='0x01' > > > function='0x0'/> > > IIUC, you're using <address> here to reference the earlier <controller> > > pcie-expander-bus. This is a bit wierd as it is making it look like the > > smmuv3Dev itself has a PCI address, but this is just the PCI address > > of the controller. > > > > The smmuv3dev also doesn't have an address on the pcie-expander-bus, > > it is just an association IIUC. > > > > So from this pov, I think I'd be inclined to say we should just > > reference the <controller> based on its index, using an attribute > > > > <iommu model='smmuv3dev' controller='2'/> > > > > I see, I will revise this to reference the controller index instead. > > > > </iommu> > > > <iommu model='smmuv3Dev'> > > > <iommufd> > > > <id>iommufd0</id> > > > </iommufd> > > > <address type='pci' domain='0x0000' bus='0x02' slot='0x01' > > > function='0x0'/> > > > </iommu> > > > </devices> > > > > > > This would get translated to a qemu command line with the arguments below: > > > > > > -device > > > '{"driver":"pxb-pcie","bus_nr":252,"id":"pci.1","bus":"pcie.0","addr":"0x1"}' > > > \ > > > -device > > > '{"driver":"pxb-pcie","bus_nr":248,"id":"pci.2","bus":"pcie.0","addr":"0x2"}' > > > \ > > > -device > > > '{"driver":"pcie-root-port","port":0,"chassis":21,"id":"pci.21","bus":"pci.1","addr":"0x0"}' > > > \ > > > -device > > > '{"driver":"pcie-root-port","port":168,"chassis":22,"id":"pci.22","bus":"pci.2","addr":"0x0"}' > > > \ > > > -object '{"qom-type":"iommufd","id":"iommufd0"}' \ > > > -device '{"driver":"arm-smmuv3-accel","bus":"pci.1"}' \ > > > -device '{"driver":"arm-smmuv3-accel","bus":"pci.2"}' \ > > > -device > > > '{"driver":"vfio-pci","host":"0009:01:00.0","id":"hostdev0","iommufd":"iommufd0","bus":"pci.21","addr":"0x0"}' > > > \ > > > -device > > > '{"driver":"vfio-pci","host":"0019:01:00.0","id":"hostdev1","iommufd":"iommufd0","bus":"pci.22","addr":"0x0"}' > > > \ > > The iommufd integration in the XML looks a bit wierd too - we have > > four different elements all referencing 'iommufd0' but nothing > > is defining this. The iommu references the iommufd0, but nothing > > actually uses this on the arm-smuv3-accel command line. > > > > > > I've not been paying much attention to iommufd in QEMU, but IIUC > > it will apply to x86_64 too. So I'm wondering how iommufd integration > > sound work in libvirt more broadly. > > > > It is my understanding that we want to consider device classes for libvirt > device representation in XML, so I intended to have users declare the > iommufd definition as an attribute under the <iommu> stanza, which would> translate to the following qemu argument: > > -object '{"qom-type":"iommufd","id":"iommufd0"}' > > but since this series implements support for multiple <iommu> definitions, > we specify iommufd0 for multiple <iommu> stanzas. For x86_64, we would just > specify the iommufd attribute once under a single <iommu> stanza. > > Would you suggest we move iommufd out of the <iommu> definition instead, > like the examples below?
AFAICT iommufd isn't connected to the guest iommu at all in terms of configuration, it is simply an attribute of the hostdev. eg we could do <hostdev mode='subsystem' type='mdev' model='vfio-pci' iommufd='on'> that does leave open the possibility that someone configures iommufd on one hostdev, but not on another, but that's not as bad as when we set it on the <iommu> too. So something we can validate in post-parse logic if we need to ensure consistent usage - if qemu allows a mix of iommfd and non-iommufd for vfio-pci, we can just allow that at libvirt too With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|