On Wed, May 06, 2026 at 04:55:27PM +0100, Matt Evans wrote:
> Hi Leon,
> 
> On 06/05/2026 16:29, Leon Romanovsky wrote:
> > 
> > On Wed, May 06, 2026 at 02:53:31PM +0100, Matt Evans wrote:
> > > Hi Alex,
> > > 
> > > On 01/05/2026 20:12, Alex Williamson wrote:
> > > > 
> > > > On Thu, 16 Apr 2026 06:17:44 -0700
> > > > Matt Evans <[email protected]> wrote:
> > > > 
> > > > > vfio_pci_dma_buf_cleanup() assumed all VFIO device DMABUFs need to be
> > > > > revoked.  However, if vfio_pci_dma_buf_move() revokes DMABUFs before
> > > > > the fd/device closes, then vfio_pci_dma_buf_cleanup() would do a
> > > > > second/underflowing kref_put() then wait_for_completion() on a
> > > > > completion that never fires.  Fixed by predicating on revocation
> > > > > status.
> > > > > 
> > > > > This could happen if PCI_COMMAND_MEMORY is cleared before closing the
> > > > > device fd (but the scenario is more likely to hit when future commits
> > > > > add more methods to revoke DMABUFs).
> > > > > 
> > > > > Fixes: 1a8a5227f2299 ("vfio: Wait for dma-buf invalidation to 
> > > > > complete")
> > > > > Signed-off-by: Matt Evans <[email protected]>
> > > > > ---
> > > > > 
> > > > > (Just a fix, but later "vfio/pci: Convert BAR mmap() to use a DMABUF"
> > > > > and "vfio/pci: Permanently revoke a DMABUF on request" depend on this
> > > > > context, so including in this series.)
> > > > 
> > > > We really need a fix for this split out from this series, It's already
> > > > been shown[1] that this is trivially reachable.  Carlos proposed[2] a
> > > > similar solution to the one below.  I was concurrently working on the
> > > > issued and suggested an alternative[3].  Let's pick a solution for
> > > > 7.1-rc.  Thanks,
> > > 
> > > It looks like [3] is progressing, so I'll drop this one when I can rebase
> > > onto it.
> > > 
> > > I noticed [3] removes the dma_resv_lock(priv->dmabuf->resv) around the
> > > priv->vdev = NULL, and this series' vfio_pci_mmap_huge_fault() relies on
> > > vdev only changing whilst resv is held to resolve a race between a fault 
> > > and
> > > cleanup (see patch 7 of this series).  The handler takes resv so that it 
> > > can
> > > stably test vdev in order to take memory_lock.
> > 
> > I think that you should rely on priv->revoked and not on priv->vdev.
> 
> Needs both unfortunately, as the fault handler ultimately needs to take
> vdev->memory_lock.

One can argue that if priv->revoked == True, all accesses to device
should be denied and treated as priv->vdev == Null.

Thanks

Reply via email to