> -----Original Message----- > From: Nathan Chen <nath...@nvidia.com> > Sent: Thursday, May 15, 2025 9:37 PM > To: devel@lists.libvirt.org > Cc: Shameerali Kolothum Thodi <shameerali.kolothum.th...@huawei.com>; > nicol...@nvidia.com; Nathan Chen <nath...@nvidia.com> > Subject: [RFC PATCH 0/5] qemu: Implement support for iommufd and multiple > vSMMUs > > Hi, > > This is a follow up to the first RFC patchset [0] for supporting multiple > vSMMU instances in a qemu VM. This patchset also introduces support for > using iommufd to propagate DMA mappings to kernel for assigned devices. > > This patchset implements support for specifying multiple <iommu> devices > within the VM definition when smmuv3Dev IOMMU model is specified, and is > tested with Shameer's latest qemu RFC for HW-accelerated vSMMU devices > [1] Based on feedback released on the above RFC and the discussion here[1], there are certain changes to the name of the vSMMU device and the way we associate the PCIe bus. Going forward it is more likely to be something like below, -device arm-smmuv3,primary-bus=pcie.0,accel=on -device vfio-pci,host=xxx,,bus=pcie.0 -device pxb-pcie,id=pcie.1,bus_nr=2 -device arm-smmuv3,primary-bus=pcie.1,accel=on ... Hopefully, this doesn't warrant any major changes to this libvirt series, but please do make a note of it. Thanks, Shameer [0] https://lore.kernel.org/qemu-devel/ab25zru7pcjnp...@redhat.com/ > Moreover, it adds a new 'iommufd' member for virDomainIOMMUDef, > in order to represent the iommufd object in qemu command line. This > patchset also implements new 'iommufdId' and 'iommufdFd' attributes for > hostdev devices to be associated with the iommufd object. > > For instance, specifying the iommufd object and associated hostdev in a > VM definition with multiple IOMMUs, configured to be routed to > pcie-expander-bus controllers in a way where VFIO device to SMMUv3 > associations are matched with the host (pcie-expander-bus and > pcie-root-port controllers are no longer auto-added/auto-routed > like in the first revision of this RFC, as the PCIe topology will be > configured by management apps): > > <devices> > ... > <controller type='pci' index='1' model='pcie-expander-bus'> > <model name='pxb-pcie'/> > <target busNr='252'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x01' > function='0x0'/> > </controller> > <controller type='pci' index='2' model='pcie-expander-bus'> > <model name='pxb-pcie'/> > <target busNr='248'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x02' > function='0x0'/> > </controller> > ... > <controller type='pci' index='21' model='pcie-root-port'> > <model name='pcie-root-port'/> > <target chassis='21' port='0x0'/> > <address type='pci' domain='0x0000' bus='0x01' slot='0x00' > function='0x0'/> > </controller> > <controller type='pci' index='22' model='pcie-root-port'> > <model name='pcie-root-port'/> > <target chassis='22' port='0xa8'/> > <address type='pci' domain='0x0000' bus='0x02' slot='0x00' > function='0x0'/> > </controller> > ... > <hostdev mode='subsystem' type='pci' managed='no'> > <source> > <address domain='0x0009' bus='0x01' slot='0x00' function='0x0'/> > </source> > <iommufdId>iommufd0</iommufdId> > <address type='pci' domain='0x0000' bus='0x15' slot='0x00' > function='0x0'/> > </hostdev> > <hostdev mode='subsystem' type='pci' managed='no'> > <source> > <address domain='0x0019' bus='0x01' slot='0x00' function='0x0'/> > </source> > <iommufdId>iommufd0</iommufdId> > <address type='pci' domain='0x0000' bus='0x16' slot='0x00' > function='0x0'/> > </hostdev> > <iommu model='smmuv3Dev'> > <iommufd> > <id>iommufd0</id> > </iommufd> > <address type='pci' domain='0x0000' bus='0x01' slot='0x01' > function='0x0'/> > </iommu> > <iommu model='smmuv3Dev'> > <iommufd> > <id>iommufd0</id> > </iommufd> > <address type='pci' domain='0x0000' bus='0x02' slot='0x01' > function='0x0'/> > </iommu> > </devices> > > This would get translated to a qemu command line with the arguments below: > > -device '{"driver":"pxb- > pcie","bus_nr":252,"id":"pci.1","bus":"pcie.0","addr":"0x1"}' \ > -device '{"driver":"pxb- > pcie","bus_nr":248,"id":"pci.2","bus":"pcie.0","addr":"0x2"}' \ > -device '{"driver":"pcie-root- > port","port":0,"chassis":21,"id":"pci.21","bus":"pci.1","addr":"0x0"}' \ > -device '{"driver":"pcie-root- > port","port":168,"chassis":22,"id":"pci.22","bus":"pci.2","addr":"0x0"}' \ > -object '{"qom-type":"iommufd","id":"iommufd0"}' \ > -device '{"driver":"arm-smmuv3-accel","bus":"pci.1"}' \ > -device '{"driver":"arm-smmuv3-accel","bus":"pci.2"}' \ > -device '{"driver":"vfio- > pci","host":"0009:01:00.0","id":"hostdev0","iommufd":"iommufd0","bus":"pci > .21","addr":"0x0"}' \ > -device '{"driver":"vfio- > pci","host":"0019:01:00.0","id":"hostdev1","iommufd":"iommufd0","bus":"pci > .22","addr":"0x0"}' \ > > If users would like to leverage qemu's iommufd feature to open the VFIO > cdev and /dev/iommu via an external management layer, the fd can be > specified like so in the VM definition: > > <devices> > <hostdev mode='subsystem' type='pci' managed='yes'> > <driver name='vfio'/> > <source> > <address domain='0x0000' bus='0x06' slot='0x12' function='0x2'/> > </source> > <iommufdId>iommufd0</iommufdId> > <iommufdFd>23</iommufdFd> > <address type='pci' domain='0x0000' bus='0x00' slot='0x03' > function='0x0'/> > </hostdev> > <iommu model='intel'> > <iommufd> > <id>iommufd0</id> > <fd>22</fd> > </iommufd> > </iommu> > </devices> > > This would get translated to a qemu command line with the arguments below: > > -object '{"qom-type":"iommufd","id":"iommufd0","fd":"22"}' \ > -device '{"driver":"vfio- > pci","host":"0000:06:12.2","id":"hostdev1","iommufd":"iommufd0","fd":"23", > "bus":"pci.0","addr":"0x3"}' \ > > Summary of changes: > - Introduced support for specifying multiple <iommu> stanzas in the VM > XML definition when using smmuv3Dev model. > - Automating PCIe topology to populate VM definition with multiple vSMMUs > routed to pcie-expander-bus controllers is excluded, in favor of > deferring creation of PXBs and routing of VFIO devices to management apps. > - Introduced iommufd support. > > TODO: > - I updated the namespace and cgroup configuration to allow access to > iommufd > paths at /dev/vfio/devices/vfio* and /dev/iommu. However, qemu needs to > be > launched with user and group set to 'root' in order for these paths to be > accessible. A passthrough device represented by /dev/vfio/18 normally has > 'root' user and group permissions, but in the mount namespace it's changed > to > 'libvirt-qemu' and 'kvm'. I wasn't able to discern where this is happening by > looking at src/qemu/qemu_namespace.c and src/qemu/qemu_cgroup.c. > Would you have > any pointers on how to change the iommufd paths' user and group > permissions in > the libvirt mount namespace? > > This series is on Github: > https://github.com/NathanChenNVIDIA/libvirt/tree/smmuv3Dev-iommufd-04- > 15-25 > > Thanks, > Nathan > > [0] > https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/7GDT6RX5L > PAJMPP4ZSC4ACME6GVMG236/ > [1] https://lore.kernel.org/qemu-devel/20250311141045.66620-1- > shameerali.kolothum.th...@huawei.com/ > > Signed-off-by: Nathan Chen <nath...@nvidia.com> > > Nathan Chen (5): > conf: Support multiple smmuv3Dev IOMMU devices > conf: Add an iommufd member struct to virDomainIOMMUDef > qemu: Implement support for associating iommufd to hostdev > qemu: Update Cgroup and namespace for qemu to access iommufd paths > qemu: Add test case for specifying iommufd > > docs/formatdomain.rst | 5 +- > src/conf/domain_addr.c | 12 +- > src/conf/domain_addr.h | 4 +- > src/conf/domain_conf.c | 292 ++++++++++++++++-- > src/conf/domain_conf.h | 21 +- > src/conf/domain_validate.c | 94 +++++- > src/conf/schemas/domaincommon.rng | 37 ++- > src/conf/virconftypes.h | 2 + > src/libvirt_private.syms | 2 + > src/qemu/qemu_alias.c | 15 +- > src/qemu/qemu_cgroup.c | 47 +++ > src/qemu/qemu_cgroup.h | 1 + > src/qemu/qemu_command.c | 146 ++++++--- > src/qemu/qemu_domain_address.c | 33 +- > src/qemu/qemu_driver.c | 8 +- > src/qemu/qemu_namespace.c | 36 +++ > src/qemu/qemu_postparse.c | 11 +- > src/qemu/qemu_validate.c | 22 +- > ...fio-iommufd-intel-iommu.x86_64-latest.args | 43 +++ > ...vfio-iommufd-intel-iommu.x86_64-latest.xml | 80 +++++ > .../hostdev-vfio-iommufd-intel-iommu.xml | 80 +++++ > tests/qemuxmlconftest.c | 1 + > 22 files changed, 878 insertions(+), 114 deletions(-) > create mode 100644 tests/qemuxmlconfdata/hostdev-vfio-iommufd-intel- > iommu.x86_64-latest.args > create mode 100644 tests/qemuxmlconfdata/hostdev-vfio-iommufd-intel- > iommu.x86_64-latest.xml > create mode 100644 tests/qemuxmlconfdata/hostdev-vfio-iommufd-intel- > iommu.xml > > -- > 2.43.0
RE: [RFC PATCH 0/5] qemu: Implement support for iommufd and multiple vSMMUs
Shameerali Kolothum Thodi via Devel Fri, 16 May 2025 04:58:04 -0700
- Re: [RFC PATCH 2/5] conf: Add... Nathan Chen via Devel
- Re: [RFC PATCH 2/5] conf:... Daniel P . Berrangé via Devel
- [RFC PATCH 3/5] qemu: Implement suppor... Nathan Chen via Devel
- Re: [RFC PATCH 3/5] qemu: Impleme... Daniel P . Berrangé via Devel
- Re: [RFC PATCH 3/5] qemu: Imp... Nathan Chen via Devel
- [RFC PATCH 4/5] qemu: Update Cgroup an... Nathan Chen via Devel
- [RFC PATCH 5/5] qemu: Add test case fo... Nathan Chen via Devel
- [RFC PATCH 1/5] conf: Support multiple... Nathan Chen via Devel
- Re: [RFC PATCH 1/5] conf: Support... Daniel P . Berrangé via Devel
- Re: [RFC PATCH 1/5] conf: Sup... Nathan Chen via Devel
- RE: [RFC PATCH 0/5] qemu: Implement su... Shameerali Kolothum Thodi via Devel
- Re: [RFC PATCH 0/5] qemu: Impleme... Nathan Chen via Devel
- Re: [RFC PATCH 0/5] qemu: Implement su... Daniel P . Berrangé via Devel
- Re: [RFC PATCH 0/5] qemu: Impleme... Nathan Chen via Devel
- Re: [RFC PATCH 0/5] qemu: Imp... Daniel P . Berrangé via Devel
- Re: [RFC PATCH 0/5] qemu:... Nathan Chen via Devel