Hi guys, trying to not let the mail thread branch to much, I'm just replying on the newest mail.
Please let me know if I missed some question. On 23.09.25 08:44, Matthew Brost wrote: > On Mon, Sep 22, 2025 at 11:25:47PM -0700, Matthew Brost wrote: >> On Mon, Sep 22, 2025 at 11:53:06PM -0600, Kasireddy, Vivek wrote: >>> Hi Jason, >>> >>>> Subject: Re: [PATCH v4 1/5] PCI/P2PDMA: Don't enforce ACS check for device >>>> functions of Intel GPUs >>>> >>>> On Mon, Sep 22, 2025 at 01:22:49PM +0200, Christian König wrote: >>>> >>>>> Well what exactly is happening here? You have a PF assigned to the >>>>> host and a VF passed through to a guest, correct? >>>>> >>>>> And now the PF (from the host side) wants to access a BAR of the VF? >>>> >>>> Not quite. >>>> >>>> It is a GPU so it has a pool of VRAM. The PF can access all VRAM and >>>> the VF can access some VRAM. >>>> >>>> They want to get a DMABUF handle for a bit of the VF's reachable VRAM >>>> that the PF can import and use through it's own funciton. >>>> >>>> The use of the VF's BAR in this series is an ugly hack. >>> IIUC, it is a common practice among GPU drivers including Xe and Amdgpu >>> to never expose VRAM Addresses and instead have BAR addresses as DMA >>> addresses when exporting dmabufs to other devices. Here is the relevant code >>> snippet in Xe: That sounds a bit mixed up. There are two different concepts which can be used here: 1. Driver exposing DMA addresses to PCIe BARs. For example this is done by amdgpu and XE to give other drivers access to MMIO registers as well as VRAM when it isn't backed by struct pages. 2. Drivers short cutting internally access paths. This is used in amdgpu and a lot of other drivers when it finds that an DMA-buf was exported by itself. For example the ISP driver part of amdgpu provides the V4L2 interface and when we interchange a DMA-buf with it we recognize that it is actually the same device we work with. Currently the implementation is based on approach #1, but as far as I can see what's actually needed is approach #2. >> I've read through this thread—Jason, correct me if I'm wrong—but I >> believe what you're suggesting is that instead of using PCIe P2P >> (dma_map_resource) to communicate the VF's VRAM offset to the PF, we >> should teach dma-buf to natively understand a VF's VRAM offset. I don't >> think this is currently built into dma-buf, but it probably should be, >> as it could benefit other use cases as well (e.g., UALink, NVLink, >> etc.). >> >> In both examples above, the PCIe P2P fabric is used for communication, >> whereas in the VF→PF case, it's only using the PCIe P2P address to >> extract the VF's VRAM offset, rather than serving as a communication >> path. I believe that's Jason's objection. Again, Jason, correct me if >> I'm misunderstanding here. >> >> Assuming I'm understanding Jason's comments correctly, I tend to agree >> with him. Yeah, agree that here is just an extremely ugly hack. >>>> The PF never actually uses the VF BAR >>> That's because the PF can't use it directly, most likely due to hardware >>> limitations. >>> >>>> it just hackily converts the dma_addr_t back >>>> to CPU physical and figures out where it is in the VRAM pool and then >>>> uses a PF centric address for it. >>>> >>>> All they want is either the actual VRAM address or the CPU physical. >>> The problem here is that the CPU physical (aka BAR Address) is only >>> usable by the CPU. Since the GPU PF only understands VRAM addresses, >>> the current exporter (vfio-pci) or any VF/VFIO variant driver cannot provide >>> the VRAM addresses that the GPU PF can use directly because they do not >>> have access to the provisioning data. >>> >> >> Right, we need to provide the offset within the VRAM provisioning, which >> the PF can resolve to a physical address based on the provisioning data. >> The series already does this—the problem is how the VF provides >> this offset. It shouldn't be a P2P address, but rather a native >> dma-buf-provided offset that everyone involved in the attachment >> understands. What you can do is to either export the DMA-buf from the driver who feels responsible for the PF directly (that's what we do in amdgpu because the VRAM is actually not fully accessible through the BAR). Or you could extend the VFIO driver with a private interface for the PF to exposing the offsets into the BAR instead of the DMA addresses. >> >>> However, it is possible that if vfio-pci or a VF/VFIO variant driver had >>> access >>> to the VF's provisioning data, then it might be able to create a dmabuf with >>> VRAM addresses that the PF can use directly. But I am not sure if exposing >>> provisioning data to VFIO drivers is ok from a security standpoint or not. >>> How are those offsets into the BAR communicated from the guest to the host in the first place? >> I'd prefer to leave the provisioning data to the PF if possible. I >> haven't fully wrapped my head around the flow yet, but it should be >> feasible for the VF → VFIO → PF path to pass along the initial VF >> scatter-gather (SG) list in the dma-buf, which includes VF-specific >> PFNs. The PF can then use this, along with its provisioning information, >> to resolve the physical address. Well don't put that into the sg_table but rather into an xarray or similar, but in general that's the correct idea. Regards, Christian. >> >> Matt >> >>> Thanks, >>> Vivek >>> >>>> >>>> Jason
