Hi Leon, Alex,
On 05/05/2026 11:58, Leon Romanovsky wrote:
On Fri, May 01, 2026 at 05:19:19PM -0600, Alex Williamson wrote:
On Thu, 16 Apr 2026 06:17:49 -0700
Matt Evans <[email protected]> wrote:
Previously, vfio_pci_zap_bars() (and the wrapper
vfio_pci_zap_and_down_write_memory_lock()) calls were paired with
calls of vfio_pci_dma_buf_move().
This commit replaces them a unified new function,
vfio_pci_zap_revoke_bars() containing both the vfio_pci_dma_buf_move()
and the unmap_mapping_range(), making it harder for callers to omit
one. It adds a wrapper, vfio_pci_lock_zap_revoke_bars(), which takes
the write memory_lock before zapping, and adds a new
vfio_pci_unrevoke_bars() for the re-enable path.
However, as of "vfio/pci: Convert BAR mmap() to use a DMABUF" the
unmap_mapping_range() to zap is entirely redundant for plain vfio-pci,
since the DMABUFs used for BAR mappings already zap PTEs when the
vfio_pci_dma_buf_move() occurs.
One exception remains as a FIXME: in nvgrace-gpu, some BAR VMAs
conditionally use custom vm_ops, which have not moved to be backed by
DMABUFs. If these BARs are mmap()ed, the vdev enables the existing
behaviour of unmap_mapping_range() for the device fd address space.
What's the plan here? Is this a temporary FIXME or a place to prove
that dmabuf for mmap works beyond the core use case?
The larger picture is that I havne't converted the nvgrace-gpu driver to
use DMABUFs for its custom mmap/fault handler code. (It's not
straightforward to test, so I don't want to hack that.)
This FIXME is added to vfio_pci_zap_revoke_bars(), saying that for this
specific sub-driver (or any others that do custom mmap) they still need
to flag so that this wrapper still performs the unmap_mapping_range().
Signed-off-by: Matt Evans <[email protected]>
---
drivers/vfio/pci/nvgrace-gpu/main.c | 5 +++
drivers/vfio/pci/vfio_pci_config.c | 30 ++++++--------
drivers/vfio/pci/vfio_pci_core.c | 62 +++++++++++++++++++----------
drivers/vfio/pci/vfio_pci_priv.h | 3 +-
include/linux/vfio_pci_core.h | 1 +
5 files changed, 62 insertions(+), 39 deletions(-)
...
@@ -1229,7 +1228,7 @@ static int vfio_pci_ioctl_reset(struct
vfio_pci_core_device *vdev,
if (!vdev->reset_works)
return -EINVAL;
- vfio_pci_zap_and_down_write_memory_lock(vdev);
+ vfio_pci_lock_zap_revoke_bars(vdev);
/*
* This function can be invoked while the power state is non-D0. If
@@ -1242,10 +1241,9 @@ static int vfio_pci_ioctl_reset(struct
vfio_pci_core_device *vdev,
*/
vfio_pci_set_power_state(vdev, PCI_D0);
- vfio_pci_dma_buf_move(vdev, true);
This seems subtle enough to be troublesome. I wonder if Leon didn't
intentionally place the dmabuf revoke after the device is in D0 to
allow the driver to interact with the device.
My intention was to place vfio_pci_dma_buf_move() as close as possible to
pci_try_reset_function(), so the device is known to be fully operational
at that point. It looks like calling it after the transition to D0 is the
right ordering.
Thanks for that, I had missed that need to have D0 at the time of the
revoke.
Getting the lock in the existing place (before D0 set) but doing a
separate vfio_zap_revoke_bars() here (replacing the bare move) seems
right, though for the sub-drivers which still need the zap (nvgace-gpu)
then the unmap_mapping_range() would happen after the D0 now. I believe
this is OK, but I wanted to flag that difference.
Many thanks,
Matt