Hi Vivek, On 18 Dec 2024, at 08:02, Kasireddy, Vivek <vivek.kasire...@intel.com> wrote:
Hi Wei Lin, Subject: Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf From: Wei Lin Guay <wg...@meta.com> This is another attempt to revive the patches posted by Jason Gunthorpe and Vivek Kasireddy, at https://patchwork.kernel.org/project/linux-media/cover/0-v2- 472615b3877e+28f7-vfio_dma_buf_...@nvidia.com/ https://urldefense.com/v3/__https://lwn.net/Articles/970751/__;!!Bt8RZUm 9aw!5uPrlSI9DhXVIeRMntADbkdWcu4YYhAzGN0OwyQ2NHxrN7bYE9f1eQBX UD8xHHPxiJorSkYhNjIRwdG4PsD2$ v2: https://lore.kernel.org/dri-devel/20240624065552.1572580-1- vivek.kasire...@intel.com/ addresses review comments from Alex and Jason and also includes the ability to create the dmabuf from multiple ranges. This is really needed to future- proof the feature. The good thing is that your patch series addressed Christian’s concerns of adding dma_buf_try_get(). No, it did not address his concern. In v2, instead of adding a new function, we are now calling get_file_rcu() directly, which was suggested by Christian in the old original review. However, several questions regarding your v2 patch: - The dma-buf still support redundant mmap handler? (One of review comments from v1?) Yeah, the mmap handler is really needed as a debugging tool given that the importer would not be able to provide access to the dmabuf's underlying memory via the CPU in any other way. - Rather than handle different regions within a single dma-buf, would vfio- user open multiple distinct file descriptors work? For our specific use case, we don't require multiple regions and prefer Jason’s original patch. Restricting the dmabuf to a single region (or having to create multiple dmabufs to represent multiple regions/ranges associated with a single scattered buffer) would not be feasible or ideal in all cases. For instance, in my use-case, I am sharing a large framebuffer (FB) located in GPU's VRAM. And, allocating a large FB contiguously (nr_ranges = 1) in VRAM is not possible when there is memory pressure. Furthermore, since we are adding a new UAPI with this patch/feature, we cannot go back and tweak it (to add support for nr_ranges > 1) should there be a need in the future, but you can always use nr_ranges = 1 anytime. That is why it makes sense to be flexible in terms of the number of ranges/regions. Also, my understanding is that this patchset cannot proceed until Leon's series is merged: https://lore.kernel.org/kvm/cover.1733398913.git.l...@kernel.org/ Thanks for the pointer. I will rebase my local patch series on top of that. AFAIK, Leon's work includes new mechanism/APIs to do P2P DMA, which we should be using in this patchset. And, I think he is also planning to use the new APIs to augment dmabuf usage and not be forced to use the scatter-gather list particularly in cases where the underlying memory is not backed by struct page. I was just waiting for all of this to happen, before posting a v3. Is there any update or ETA for the v3? Are there any ways we can help? Additionally, do you have any repo that we can access to begin validating our user API changes. This would greatly aid us in our software development. Thanks, Wei Lin Thanks, Vivek Thanks, Wei Lin Thanks, Vivek In addition to the initial proposal by Jason, another promising application is exposing memory from an AI accelerator (bound to VFIO) to an RDMA device. This would allow the RDMA device to directly access the accelerator's memory, thereby facilitating direct data transactions between the RDMA device and the accelerator. Below is from the text/motivation from the orginal cover letter. dma-buf has become a way to safely acquire a handle to non-struct page memory that can still have lifetime controlled by the exporter. Notably RDMA can now import dma-buf FDs and build them into MRs which allows for PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory from PCI device BARs. This series supports a use case for SPDK where a NVMe device will be owned by SPDK through VFIO but interacting with a RDMA device. The RDMA device may directly access the NVMe CMB or directly manipulate the NVMe device's doorbell using PCI P2P. However, as a general mechanism, it can support many other scenarios with VFIO. I imagine this dmabuf approach to be usable by iommufd as well for generic and safe P2P mappings. This series goes after the "Break up ioctl dispatch functions to one function per ioctl" series. v2: - Name the new file dma_buf.c - Restore orig_nents before freeing - Fix reversed logic around priv->revoked - Set priv->index - Rebased on v2 "Break up ioctl dispatch functions" v1: https://lore.kernel.org/r/0-v1-9e6e1739ed95+5fa- vfio_dma_buf_...@nvidia.com Cc: linux-r...@vger.kernel.org Cc: Oded Gabbay <ogab...@kernel.org> Cc: Christian König <christian.koe...@amd.com> Cc: Daniel Vetter <daniel.vet...@ffwll.ch> Cc: Leon Romanovsky <l...@kernel.org> Cc: Maor Gottlieb <ma...@nvidia.com> Cc: dri-devel@lists.freedesktop.org Signed-off-by: Jason Gunthorpe <j...@nvidia.com> Jason Gunthorpe (3): vfio: Add vfio_device_get() dma-buf: Add dma_buf_try_get() vfio/pci: Allow MMIO regions to be exported through dma-buf Wei Lin Guay (1): vfio/pci: Allow export dmabuf without move_notify from importer drivers/vfio/pci/Makefile | 1 + drivers/vfio/pci/dma_buf.c | 291 +++++++++++++++++++++++++++++ drivers/vfio/pci/vfio_pci_config.c | 8 +- drivers/vfio/pci/vfio_pci_core.c | 44 ++++- drivers/vfio/pci/vfio_pci_priv.h | 30 +++ drivers/vfio/vfio_main.c | 1 + include/linux/dma-buf.h | 13 ++ include/linux/vfio.h | 6 + include/linux/vfio_pci_core.h | 1 + include/uapi/linux/vfio.h | 18 ++ 10 files changed, 405 insertions(+), 8 deletions(-) create mode 100644 drivers/vfio/pci/dma_buf.c -- 2.43.5