This patch series adds a new VFIO selftest plugin driver for NVIDIA GPUs that enables DMA testing via the Falcon microcontrollers.
Patch 1: Kernel selftests are a collection of test programs that live within the Linux kernel source tree (tools/testing/selftests/) and are designed to test various kernel subsystems from userspace. The VFIO selftest framework have a pluggable driver architecture that allows different hardware drivers to implement various test capabilities. However, not all drivers can trigger MSI/MSI-X interrupts from software. This patch adds checks to gracefully skip MSI-related tests when the driver's send_msi callback is NULL, allowing drivers without MSI support to still run the DMA functionality tests. It also makes MSI truly optional by checking msi_fd validity before operations. Patch 2: This patch introduces the core implementation of the plugin driver. It extracts and adapts relevant functionality from NVIDIA's gpu-admin-tools project [1], integrating it into the VFIO selftest framework. As a result, any system equipped with a PCIe slot and a supported NVIDIA GPU can now run VFIO DMA selftests using commonly available hardware. The Falcon is a general-purpose microcontroller present on NVIDIA GPUs that can perform DMA operations between system memory and device memory. The core VFIO selftest infrastructure handles: - VFIO container/group management - IOMMU domain setup - DMA buffer allocation and mapping - Test orchestration and reporting The plugin drivers provide device-specific implementations for: - Probing and initializing device - Triggering DMA operations - Verifying DMA completion - Device cleanup [1] https://github.com/NVIDIA/gpu-admin-tools Changes in v9: - Squashed patch 3 (PMU falcon support for Kepler and Maxwell Gen1) into patch 2, as the registers and fields required have been approved for open source disclosure Changes in v8: - Corrected Makefile to also build nv_falcons driver on other architectures than x86_64 Changes in v7: - Added Hopper (H100) support - Made MSI optional by checking msi_fd != -1 in ASSERT_NO_MSI macro and guarding fcntl_set_nonblock() calls - Refactored to use gpu_properties_map[] array indexed by enum gpu_arch - Added falcon_map[] array indexed by enum falcon_type for cleaner initialization - Coding style fixes Changes in v6: - Added GPU architecture detection - Refactored GPU detection to use per-architecture property structs Changes in v5: - Reorganized as a 3-patch series - Added patch to skip MSI tests for drivers without MSI support - Removed stub MSI function from Falcon driver - Added support to Maxwell Gen1 GPUs and Kepler GPUs Changes in v4: - Removed redundant PCI_VENDOR_ID_NVIDIA macro - Macro cleanup and style fixes Changes in v3: - Updated cover letter to clarify purpose and scope Changes in v2: - Fixed NV_PMC_ENABLE_PWR macro value (0x2000, was incorrectly 0x1000) - Added gpu_disable_bus_master and falcon_disable calls in remove path for proper cleanup - Added error handling for unknown GPU pmc_boot_0 values - General code cleanup and style fixes - Note: Kepler cards may not work, pending further testing Rubin Du (2): selftests/vfio: Skip MSI tests for drivers that cannot raise interrupts selftests/vfio: Add NVIDIA Falcon driver for DMA testing .../vfio/lib/drivers/nv_falcons/hw.h | 345 ++++++++ .../vfio/lib/drivers/nv_falcons/nv_falcons.c | 757 ++++++++++++++++++ .../lib/include/libvfio/vfio_pci_device.h | 3 + tools/testing/selftests/vfio/lib/libvfio.mk | 2 + .../selftests/vfio/lib/vfio_pci_driver.c | 4 +- .../selftests/vfio/vfio_pci_driver_test.c | 8 + 6 files changed, 1118 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/vfio/lib/drivers/nv_falcons/hw.h create mode 100644 tools/testing/selftests/vfio/lib/drivers/nv_falcons/nv_falcons.c -- 2.43.0

