On Fri, 21 Nov 2025 10:08:45 +0000 Anatoly Burakov <[email protected]> wrote:
> This patchset introduces a major refactor of the VFIO subsystem in DPDK to > support character device (cdev) interface introduced in Linux kernel, as well > as > make the API more streamlined and useful. The goal is to simplify device > management, improve compatibility, and clarify API responsibilities. > > The following sections outline the key issues addressed by this patchset and > the > corresponding changes introduced. > > 1. Only group mode is supported > =============================== > > Since kernel version 4.14.327 (LTS), VFIO supports the new character device > (cdev)-based way of working with VFIO devices (otherwise known as IOMMUFD). > This > is a device-centric mode and does away with all the complexity regarding > groups > and IOMMU types, delegating it all to the kernel, and exposes a much simpler > interface to userspace. > > The old group interface is still around, and will need to be kept in DPDK both > for compatibility reasons, as well as supporting special cases (FSLMC bus, NBL > driver, etc.). > > To enable this, VFIO is heavily refactored, so that the code can support both > modes while relying on (mostly) common infrastructure. > > Note that the existing `rte_vfio_device_setup/release` model is fundamentally > incompatible with cdev mode, because for custom container cases, the expected > flow is that the user binds the IOMMU group (and thus, implicitly, the device > itself) to a specific container using `rte_vfio_container_group_bind`, whereas > this step is not needed for cdev as the device fd is assigned to the container > straight away. > > Therefore, what we do instead is introduce a new API for container device > assignment which, semantically, will assign a device to specified container, > so > that when it is mapped using `rte_pci_map_device`, the appropriate container > is > selected. Under the hood though, we essentially transition to getting device > fd > straight away at assign stage, so that by the time the PCI bus attempts to map > the device, it is already mapped and we just return an fd. There is no > "unassign" API because `release_device` already performs that function. > > Additionally, a new `rte_vfio_get_mode` API is added for those cases that need > some introspection into VFIO's internals, with three new modes: group > (old-style), no-iommu (old-style but without IOMMU), and cdev (the new mode). > Although no-IOMMU is technically a variant of group mode, the distinction is > largely irrelevant to the user, as all usages of noiommu checks in our > codebase > are for deciding whether to use IOVA or PA, not anything to do with managing > groups. The current plan for kernel community is to *not* introduce no-IOMMU > cdev implementation, which is why this will be kept for compatibility for > these > use cases. > > There were other users of VFIO which relied on group API but only for > convenience > purposes; no actual VFIO functionality depended on those API's. Therefore, > group > API's are removed and, where appropriate, replaced with the new API's. > > List of removed API's: > > * `rte_vfio_get_group_fd` > * `rte_vfio_clear_group` > * `rte_vfio_container_group_bind` (replaced by container assign API) > * `rte_vfio_container_group_unbind` > * `rte_vfio_noiommu_is_enabled` (replaced by new mode API) > > 2. The API responsibilities aren't clear and bleed into each other > ================================================================== > > Some API's do multiple things at once. In particular: > > * `rte_vfio_get_device_info` will setup the device > * `rte_vfio_setup_device` will get device info > > These API's have been adjusted to do one thing only. > > v6: > - Fixed missing header include in vfio cdev file > > v5: > - Added back missing uapi patch > > v4: > - Fixed issues with documenting rte_vfio_mode enum > - Separated deprecation notices into a separate patchset > > v3: > - Make API removal cleaner > - Fix `get_group_num` usages to align with new API > - Fix issues with function exports > - Fix issues with `setup_device` returning old-style values in some cases > > v2: > - Make the entire API internal > - More aggressive API pruning, complete removal of group API > - Fixed a bug in group mode where device could not be used > - Better documentation and deprecation notice patches > - Moved doc patches to beginning of patchset > > Anatoly Burakov (18): > uapi: update to v6.17 and add iommufd.h > vfio: make all functions internal > vfio: split get device info from setup > vfio: add container device assignment API > net/nbl: do not use VFIO group bind API > net/ntnic: use container device assignment API > vdpa/ifc: use container device assignment API > vdpa/nfp: use container device assignment API > vdpa/sfc: use container device assignment API > vhost: remove group-related API from drivers > vfio: remove group-based API > vfio: cleanup and refactor > bus/pci: use the new VFIO mode API > bus/fslmc: use the new VFIO mode API > net/hinic3: use the new VFIO mode API > net/ntnic: use the new VFIO mode API > vfio: remove no-IOMMU check API > vfio: introduce cdev mode > > config/arm/meson.build | 1 + > config/meson.build | 1 + > doc/guides/prog_guide/vhost_lib.rst | 4 - > drivers/bus/cdx/cdx_vfio.c | 25 +- > drivers/bus/fslmc/fslmc_bus.c | 10 +- > drivers/bus/fslmc/fslmc_vfio.c | 6 +- > drivers/bus/pci/linux/pci.c | 2 +- > drivers/bus/pci/linux/pci_vfio.c | 33 +- > drivers/bus/platform/platform.c | 9 +- > drivers/crypto/bcmfs/bcmfs_vfio.c | 14 +- > drivers/net/hinic3/base/hinic3_hwdev.c | 2 +- > drivers/net/nbl/nbl_common/nbl_userdev.c | 20 +- > drivers/net/nbl/nbl_include/nbl_include.h | 1 + > drivers/net/ntnic/ntnic_ethdev.c | 2 +- > drivers/net/ntnic/ntnic_vfio.c | 30 +- > drivers/vdpa/ifc/ifcvf_vdpa.c | 34 +- > drivers/vdpa/mlx5/mlx5_vdpa.c | 1 - > drivers/vdpa/nfp/nfp_vdpa.c | 37 +- > drivers/vdpa/sfc/sfc_vdpa.c | 39 +- > drivers/vdpa/sfc/sfc_vdpa.h | 2 - > kernel/linux/uapi/linux/iommufd.h | 1292 +++++++++++ > kernel/linux/uapi/linux/vduse.h | 2 +- > kernel/linux/uapi/linux/vfio.h | 12 +- > kernel/linux/uapi/version | 2 +- > lib/eal/freebsd/eal.c | 98 +- > lib/eal/include/rte_vfio.h | 387 ++-- > lib/eal/linux/eal_vfio.c | 2433 ++++++++------------- > lib/eal/linux/eal_vfio.h | 167 +- > lib/eal/linux/eal_vfio_cdev.c | 390 ++++ > lib/eal/linux/eal_vfio_group.c | 984 +++++++++ > lib/eal/linux/eal_vfio_mp_sync.c | 80 +- > lib/eal/linux/meson.build | 2 + > lib/eal/windows/eal.c | 4 +- > lib/vhost/vdpa_driver.h | 3 - > 34 files changed, 4261 insertions(+), 1868 deletions(-) > create mode 100644 kernel/linux/uapi/linux/iommufd.h > create mode 100644 lib/eal/linux/eal_vfio_cdev.c > create mode 100644 lib/eal/linux/eal_vfio_group.c > Since this is a big patch and hard to see everything, ran the patch series through AI with the current AGENTS.md guideline. Agree that a release note is needed. ## DPDK Patch Review: VFIO cdev Mode Series (v6, 18 patches) **Author:** Anatoly Burakov <[email protected]> ### Overview This 18-patch series introduces VFIO cdev (IOMMUFD) mode support, refactors VFIO infrastructure, and makes VFIO APIs internal. The series is at v6 and has accumulated several Acked-by tags. --- ### ✅ PASSED Checks | Check | Status | |-------|--------| | Subject lines ≤60 characters | ✓ All pass | | Subject lowercase (except acronyms) | ✓ Correct | | Component prefixes | ✓ Valid: `vfio:`, `bus/pci:`, `net/nbl:`, `vdpa/sfc:`, etc. | | No trailing periods in subjects | ✓ None found | | Body wrapped at 75 characters | ✓ All pass | | Body does not start with "It" | ✓ Confirmed | | `Signed-off-by:` present | ✓ All 18 patches | | VF/PF capitalization | ✓ Correct usage | | SPDX license in new files | ✓ BSD-3-Clause with copyright in `eal_vfio_group.c`, `eal_vfio_cdev.c` | | `__rte_internal` placement | ✓ Alone on line, in header files only | | Kernel UAPI headers | ✓ GPL-2.0 (appropriate for kernel headers) | | Tag order | ✓ Correct (Signed-off-by before Acked-by) | --- ### ⚠️ WARNINGS (should fix) #### 1. Missing Release Notes **Severity:** Warning **Location:** Series-wide This series makes significant API changes that warrant release notes: - **Patch 02/18:** Makes entire VFIO API internal-only (ABI change for applications) - **Patch 12/18:** Changes return value semantics for `rte_vfio_setup_device()` and `rte_vfio_get_group_num()` (now return -1 with `rte_errno=ENODEV` instead of 1) - **Patch 18/18:** Introduces new VFIO cdev mode (`RTE_VFIO_MODE_CDEV`) **Recommendation:** Add entry to `doc/guides/rel_notes/release_25_XX.rst` documenting: - VFIO API is now internal (drivers only) - Return value changes for affected functions - New cdev/IOMMUFD mode support #### 2. Implicit Integer Comparison **Severity:** Warning **Location:** Patch 12/18, `eal_vfio_group.c` ```c // Line ~8360 in mbox (in vfio_has_supported_extensions function) if (!n_extensions) // n_extensions is unsigned int ``` **Should be:** ```c if (n_extensions == 0) ``` Per AGENTS.md: "Integers - compare explicitly with zero" --- ### ℹ️ INFO (observations) 1. **Good commit message structure:** The series has well-written commit messages, particularly patch 12/18 which clearly documents the behavioral changes. 2. **Proper API tagging:** All new internal APIs use `__rte_internal` correctly positioned. 3. **Kernel header long line:** Line 613 in `iommufd.h` exceeds 100 chars, but this is a verbatim kernel UAPI header import - acceptable. 4. **Acks accumulated:** Patches 01, 12, and 14 have maintainer Acks (Stephen Hemminger, Hemant Agrawal). --- ### Summary | Category | Count | |----------|-------| | Errors | 0 | | Warnings | 2 | | Info | 4 | **Verdict:** The series is in good shape for this stage (v6). The two warnings should be addressed before merging: 1. Add release notes for the API changes 2. Fix the implicit integer comparison in `vfio_has_supported_extensions()`

