On Thu, 25 Feb 2021 19:49:49 -0400
Jason Gunthorpe <j...@nvidia.com> wrote:

> On Thu, Feb 25, 2021 at 03:21:13PM -0700, Alex Williamson wrote:
> 
> > This is where it gets tricky.  The vm_pgoff we get from
> > file_operations.mmap is already essentially describing an offset from
> > the base of a specific resource.  We could convert that from an absolute
> > offset to a pfn offset, but it's only the bus driver code (ex.
> > vfio-pci) that knows how to get the base, assuming there is a single
> > base per region (we can't assume enough bits per region to store
> > absolute pfn).  Also note that you're suggesting that all vfio mmaps
> > would need to standardize on the vfio-pci implementation of region
> > layouts.  Not that most drivers haven't copied vfio-pci, but we've
> > specifically avoided exposing it as a fixed uAPI such that we could have
> > the flexibility for a bus driver to implement regions offsets however
> > they need.  
> 
> Okay, well the bus driver owns the address space and the bus driver is
> in control of the vm_pgoff. If it doesn't want to zap then it doesn't
> need to do anything
> 
> vfio-pci can consistently use the index encoding and be fine
>  
> > So I'm not really sure what this looks like.  Within vfio-pci we could
> > keep the index bits in place to allow unmmap_mapping_range() to
> > selectively zap matching vm_pgoffs but expanding that to a vfio
> > standard such that the IOMMU backend can also extract a pfn looks very
> > limiting, or ugly.  Thanks,  
> 
> Lets add a op to convert a vma into a PFN range. The map code will
> pass the vma to the op and get back a pfn (or failure).
> 
> pci will switch the vm_pgoff to an index, find the bar base and
> compute the pfn.
> 
> It is starting to look more and more like dma buf though

How terrible would it be if vfio-core used a shared vm_private_data
structure and inserted itself into the vm_ops call chain to reference
count that structure?  We already wrap the device file descriptor via
vfio_device_fops.mmap, so we could:

        struct vfio_vma_private_data *priv;

        priv = kzalloc(...
        
        priv->device = device;
        vma->vm_private_data = priv;

        device->ops->mmap(device->device_data, vma);

        if (vma->vm_private_data == priv) {
                priv->vm_ops = vma->vm_ops;
                vma->vm_ops = &vfio_vm_ops;
        } else
                kfree(priv);

Where:

struct vfio_vma_private_data {
        struct vfio_device *device;
        unsigned long pfn_base;
        void *private_data; // maybe not needed
        const struct vm_operations_struct *vm_ops;
        struct kref kref;
        unsigned int release_notification:1;
};

Therefore unless a bus driver opts-out by replacing vm_private_data, we
can identify participating vmas by the vm_ops and have flags indicating
if the vma maps device memory such that vfio_get_device_from_vma()
should produce a device reference.  The vfio IOMMU backends would also
consume this, ie. if they get a valid vfio_device from the vma, use the
pfn_base field directly.  vfio_vm_ops would wrap the bus driver
callbacks and provide reference counting on open/close to release this
object.

I'm not thrilled with a vfio_device_ops callback plumbed through
vfio-core to do vma-to-pfn translation, so I thought this might be a
better alternative.  Thanks,

Alex

Reply via email to