On Thu, 19 Mar 2026 19:04:37 +0000 David Matlack <[email protected]> wrote:
> On 2026-03-17 02:42 PM, Rubin Du wrote: > > Add a new VFIO PCI driver for NVIDIA GPUs that enables DMA testing > > via the Falcon (Fast Logic Controller) microcontrollers. This driver > > extracts and adapts the DMA test functionality from the NVIDIA > > gpu-admin-tools project and integrates it into the existing VFIO > > selftest framework. > > > > The Falcon is a general-purpose microcontroller present on NVIDIA GPUs > > that can perform DMA operations between system memory and device memory. > > By leveraging Falcon DMA, this driver allows NVIDIA GPUs to be tested > > alongside Intel IOAT and DSA devices using the same selftest infrastructure. > > > > Supported GPUs: > > - Kepler: K520, GTX660, K4000, K80, GT635 > > - Maxwell Gen1: GTX750, GTX745 > > - Maxwell Gen2: M60 > > - Pascal: P100, P4, P40 > > - Volta: V100 > > - Turing: T4 > > - Ampere: A16, A100, A10 > > - Ada: L4, L40S > > - Hopper: H100 > > > > The PMU falcon on Kepler and Maxwell Gen1 GPUs uses legacy FBIF register > > offsets and requires enabling via PMC_ENABLE with the HUB bit set. > > > > Limitations and tradeoffs: > > > > 1. Architecture support: > > Blackwell and newer architectures may require additional work > > due to firmware. > > > > 2. Synchronous DMA operations: > > Each transfer blocks until completion because the reference > > implementation does not expose command queuing - only one > > DMA operation can be in flight at a time. > > Asynchronous DMA will be important for testing Live Update: > > https://lore.kernel.org/kvm/[email protected]/ > > That is why I split memcpy_start() and memcpy_wait() from the beginning. > > Would it be possible to add support for it here even though it is not in > the reference implementation? I'll leave the can-we questions to Rubin, but do you see either the MSI or asynchronous issues as blockers? Currently our driver tests are limited to a very narrow range of Intel server platforms, whereas this is a plug'able endpoint we can install anywhere. I'd think that's sufficiently valuable in expanding the test base to make some compromises. Thanks, Alex

