Hi Leon, Alex,

On 05/05/2026 11:58, Leon Romanovsky wrote:

On Fri, May 01, 2026 at 05:19:19PM -0600, Alex Williamson wrote:
On Thu, 16 Apr 2026 06:17:49 -0700
Matt Evans <[email protected]> wrote:

Previously, vfio_pci_zap_bars() (and the wrapper
vfio_pci_zap_and_down_write_memory_lock()) calls were paired with
calls of vfio_pci_dma_buf_move().

This commit replaces them a unified new function,
vfio_pci_zap_revoke_bars() containing both the vfio_pci_dma_buf_move()
and the unmap_mapping_range(), making it harder for callers to omit
one.  It adds a wrapper, vfio_pci_lock_zap_revoke_bars(), which takes
the write memory_lock before zapping, and adds a new
vfio_pci_unrevoke_bars() for the re-enable path.

However, as of "vfio/pci: Convert BAR mmap() to use a DMABUF" the
unmap_mapping_range() to zap is entirely redundant for plain vfio-pci,
since the DMABUFs used for BAR mappings already zap PTEs when the
vfio_pci_dma_buf_move() occurs.

One exception remains as a FIXME: in nvgrace-gpu, some BAR VMAs
conditionally use custom vm_ops, which have not moved to be backed by
DMABUFs.  If these BARs are mmap()ed, the vdev enables the existing
behaviour of unmap_mapping_range() for the device fd address space.

What's the plan here?  Is this a temporary FIXME or a place to prove
that dmabuf for mmap works beyond the core use case?

The larger picture is that I havne't converted the nvgrace-gpu driver to use DMABUFs for its custom mmap/fault handler code. (It's not straightforward to test, so I don't want to hack that.)

This FIXME is added to vfio_pci_zap_revoke_bars(), saying that for this specific sub-driver (or any others that do custom mmap) they still need to flag so that this wrapper still performs the unmap_mapping_range().

Signed-off-by: Matt Evans <[email protected]>
---
  drivers/vfio/pci/nvgrace-gpu/main.c |  5 +++
  drivers/vfio/pci/vfio_pci_config.c  | 30 ++++++--------
  drivers/vfio/pci/vfio_pci_core.c    | 62 +++++++++++++++++++----------
  drivers/vfio/pci/vfio_pci_priv.h    |  3 +-
  include/linux/vfio_pci_core.h       |  1 +
  5 files changed, 62 insertions(+), 39 deletions(-)

...
@@ -1229,7 +1228,7 @@ static int vfio_pci_ioctl_reset(struct 
vfio_pci_core_device *vdev,
        if (!vdev->reset_works)
                return -EINVAL;
- vfio_pci_zap_and_down_write_memory_lock(vdev);
+       vfio_pci_lock_zap_revoke_bars(vdev);
/*
         * This function can be invoked while the power state is non-D0. If
@@ -1242,10 +1241,9 @@ static int vfio_pci_ioctl_reset(struct 
vfio_pci_core_device *vdev,
         */
        vfio_pci_set_power_state(vdev, PCI_D0);
- vfio_pci_dma_buf_move(vdev, true);

This seems subtle enough to be troublesome.  I wonder if Leon didn't
intentionally place the dmabuf revoke after the device is in D0 to
allow the driver to interact with the device.

My intention was to place vfio_pci_dma_buf_move() as close as possible to
pci_try_reset_function(), so the device is known to be fully operational
at that point. It looks like calling it after the transition to D0 is the
right ordering.

Thanks for that, I had missed that need to have D0 at the time of the revoke.

Getting the lock in the existing place (before D0 set) but doing a separate vfio_zap_revoke_bars() here (replacing the bare move) seems right, though for the sub-drivers which still need the zap (nvgace-gpu) then the unmap_mapping_range() would happen after the D0 now. I believe this is OK, but I wanted to flag that difference.

Many thanks,


Matt

Reply via email to