This patch series adds a new VFIO selftest plugin driver for NVIDIA 
GPUs that enables DMA testing via the Falcon microcontrollers.

Patch 1: 

Kernel selftests are a collection of test programs that live within the
Linux kernel source tree (tools/testing/selftests/) and are designed to
test various kernel subsystems from userspace. The VFIO selftest 
framework have a pluggable driver architecture that allows different 
hardware drivers to implement various test capabilities. However, not 
all drivers can trigger MSI/MSI-X interrupts from software.

This patch adds checks to gracefully skip MSI-related tests when the
driver's send_msi callback is NULL, allowing drivers without MSI
support to still run the DMA functionality tests. It also makes MSI
truly optional by checking msi_fd validity before operations.

Patch 2:

This patch introduces the core implementation of the plugin driver. It 
extracts and adapts relevant functionality from NVIDIA's gpu-admin-tools 
project [1], integrating it into the VFIO selftest framework. As a 
result, any system equipped with a PCIe slot and a supported NVIDIA GPU 
can now run VFIO DMA selftests using commonly available hardware.

The Falcon is a general-purpose microcontroller present on NVIDIA GPUs
that can perform DMA operations between system memory and device memory.

The core VFIO selftest infrastructure handles:

- VFIO container/group management
- IOMMU domain setup
- DMA buffer allocation and mapping
- Test orchestration and reporting

The plugin drivers provide device-specific implementations for:

- Probing and initializing device
- Triggering DMA operations
- Verifying DMA completion
- Device cleanup

[1] https://github.com/NVIDIA/gpu-admin-tools

Changes in v9:
- Squashed patch 3 (PMU falcon support for Kepler and Maxwell Gen1)
  into patch 2, as the registers and fields required have been approved
  for open source disclosure

Changes in v8:
- Corrected Makefile to also build nv_falcons driver on other
  architectures than x86_64

Changes in v7:
- Added Hopper (H100) support
- Made MSI optional by checking msi_fd != -1 in ASSERT_NO_MSI
  macro and guarding fcntl_set_nonblock() calls
- Refactored to use gpu_properties_map[] array indexed by enum gpu_arch
- Added falcon_map[] array indexed by enum falcon_type for cleaner
  initialization
- Coding style fixes

Changes in v6:
- Added GPU architecture detection
- Refactored GPU detection to use per-architecture property structs

Changes in v5:
- Reorganized as a 3-patch series
- Added patch to skip MSI tests for drivers without MSI support
- Removed stub MSI function from Falcon driver
- Added support to Maxwell Gen1 GPUs and Kepler GPUs

Changes in v4:
- Removed redundant PCI_VENDOR_ID_NVIDIA macro
- Macro cleanup and style fixes

Changes in v3:
- Updated cover letter to clarify purpose and scope

Changes in v2:
- Fixed NV_PMC_ENABLE_PWR macro value (0x2000, was incorrectly 0x1000)
- Added gpu_disable_bus_master and falcon_disable calls in remove path
  for proper cleanup
- Added error handling for unknown GPU pmc_boot_0 values
- General code cleanup and style fixes
- Note: Kepler cards may not work, pending further testing

Rubin Du (2):
  selftests/vfio: Skip MSI tests for drivers that cannot raise
    interrupts
  selftests/vfio: Add NVIDIA Falcon driver for DMA testing

 .../vfio/lib/drivers/nv_falcons/hw.h          | 345 ++++++++
 .../vfio/lib/drivers/nv_falcons/nv_falcons.c  | 757 ++++++++++++++++++
 .../lib/include/libvfio/vfio_pci_device.h     |   3 +
 tools/testing/selftests/vfio/lib/libvfio.mk   |   2 +
 .../selftests/vfio/lib/vfio_pci_driver.c      |   4 +-
 .../selftests/vfio/vfio_pci_driver_test.c     |   8 +
 6 files changed, 1118 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/vfio/lib/drivers/nv_falcons/hw.h
 create mode 100644 
tools/testing/selftests/vfio/lib/drivers/nv_falcons/nv_falcons.c

-- 
2.43.0


Reply via email to