Hi Jason + Christian, On 27/02/2026 12:51, Jason Gunthorpe wrote: > On Fri, Feb 27, 2026 at 11:09:31AM +0100, Christian König wrote: > >> When a DMA-buf just represents a linear piece of BAR which is >> map-able through the VFIO FD anyway then the right approach is to >> just re-direct the mapping to this VFIO FD.
We think limiting this to one range per DMABUF isn't enough, i.e. supporting multiple ranges will be a benefit. Bumping vm_pgoff to then reuse vfio_pci_mmap_ops is a really nice suggestion for the simplest case, but can't support multiple ranges; the .fault() needs to be aware of the non-linear DMABUF layout. > I actually would like to go the other way and have VFIO always have a > DMABUF under the VMA's it mmaps because that will make it easy to > finish the type1 emulation which requires finding dmabufs for the > VMAs. > >> It can be that you want additional checks (e.g. if the DMA-buf is >> revoked) in which case you would need to override the vma->vm_ops, >> but then just do the access checks and call the vfio_pci_mmap_ops to >> get the actually page fault handling done. > > It isn't that simple, the vm_ops won't have a way to get back to the > dmabuf from the vma to find the per-fd revoke flag to check it. Sounds like the suggestion is just to reuse vfio_pci_mmap_*fault(), i.e. install "interposer" vm_ops for some new 'fault_but_check_revoke()' to then call down to the existing vfio_pci_mmap_*fault(), after fishing the DMABUF out of vm_private_data. (Like the proposed vfio_pci_dma_buf_mmap_huge_fault() does.) Putting aside the above point of needing a new .fault() able to find a PFN for >1 range for a mo, how would the test of the revoked flag work w.r.t. synchronisation and protecting against a racing revoke? It's not safe to take memory_lock, test revoked, unlock, then hand over to the existing vfio_pci_mmap_*fault() -- which re-takes the lock. I'm not quite seeing how we could reuse the existing vfio_pci_mmap_*fault(), TBH. I did briefly consider refactoring that existing .fault() code, but that makes both paths uglier. To summarise, I think we still - need a new fops->mmap() to link vfio_pci_dma_buf into vm_private_data, and determine WC attrs - need a new vm_ops->fault() to test dmabuf->revoked/status and determine map vs fault with memory_lock held, and to determine the PFN from >1 DMABUF ranges >>> + unmap_mapping_range(priv->dmabuf->file->f_mapping, >>> + 0, priv->size, 1); >> >> When you need to use unmap_mapping_range() then you usually share >> the address space object between the file descriptor exporting the >> DMA-buf and the DMA-buf fd itself. > > Yeah, this becomes problematic. Right now there is a single address > space per vfio-device and the invalidation is global. > > Possibly for this use case you can keep that and do a global unmap and > rely on fault to restore the mmaps that were not revoked. Hm, that'd be functional, but we should consider huge BARs with a lot of PTEs (even huge ones); zapping all BARs might noticeably disturb other clients. But see my query below please, if we could zap just the resource being reclaimed that would be preferable. >> Otherwise functions like vfio_pci_zap_bars() doesn't work correctly >> any more and that usually creates a huge bunch of problems. I'd reasoned it was OK for the DMABUF to have its own unique address space -- even though IIUC that means an unmap_mapping_range() by vfio_pci_core_device won't affect a DMABUF's mappings -- because anything that needs to zap a BAR _also_ must already plan to notify DMABUF importers via vfio_pci_dma_buf_move(). And then, vfio_pci_dma_buf_move() will zap the mappings. Are there paths that _don't_ always pair vfio_pci_zap_bars() with a vfio_pci_dma_buf_move()? I'm sure I'm missing something, so question phrased as a statement: The only way that mappings could be missed would be if some path forgets to ...buf_move() when zapping the BARs, but that'd be a problem for importers regardless of whether they can now also be mmap()ed, no? I don't want to flout convention for the sake of it, and am keen to learn more, so please gently explain in more detail: Why must we associate the DMABUFs with the VFIO address space [by sharing the AS object between the VFIO fd exporting the DMABUF and the DMABUF fd]? Many thanks, Matt
