On Wed, Jan 28, 2026 at 10:42:53AM -0800, Matthew Brost wrote: Let me fix a couple typos...
> On Wed, Jan 28, 2026 at 11:14:58AM -0400, Jason Gunthorpe wrote: > > On Tue, Jan 27, 2026 at 04:48:36PM -0800, Matthew Brost wrote: > > > Add an IOVA interface to the DRM pagemap layer. This provides a semantic > > > wrapper around the dma-map IOVA alloc/link/sync/unlink/free API while > > > remaining flexible enough to support future high-speed interconnects > > > between devices. > > > > I don't think this is a very clear justification. > > > > "IOVA" and dma_addr_t should be strictly reserved for communication > > that flows through the interconnect that Linux struct device is aware > > of (ie the PCIe fabric). It should not ever be used for "high speed > > interconnects" implying some private and hidden things like > > xgmi/nvlink/ualink type stuff. > > > > Yes, the future is looking forward to xgmi/nvlink/ualink type stuff. I > agree we (DRM pagemap, GPU SVM, Xe) need a refactor to avoid using > dma_addr_t for any interfaces here once we unify this xgmi/nvlink/ualink > as dma_addr_t doesn't make tons of sense. This is a PoC the code structure. > s/IOVA/something else/ for interfaces may make sense too. > > > I can't think of any reason why you'd want to delegate constructing > > the IOVA to some other code. I can imagine you'd want to get a pfn > > list from someplace else and turn that into a mapping. > > > > Yes, this is exactly what I envision here. First, let me explain the > possible addressing modes on the UAL fabric: > > - Physical (akin to IOMMU passthrough) > - Virtual (akin to IOMMU enabled) > > Physical mode is straightforward — resolve the PFN to a cross-device > physical address, then install it into the initiator’s page tables along > with a bit indicating routing over the network. In this mode, the vfuncs > here are basically NOPs. > > Virtual mode is the tricky one. There are addressing modes where a > virtual address must be allocated at the target device (i.e., the > address on the wire is translated at the target via a page-table walk). > This is why the code is structured the way it is, and why I envision a > UAL API that mirrors dma-map. At the initiator the initiator target s/initiator target/target > virtual addresss is installed the page tables along with a bit > indicating routing over the network. > > Let me give some examples of what this would look like in a few of the > vfuncs — see [1] for the dma-map implementation. Also ignore dma_addr_t > abuse for now. > > [1] https://patchwork.freedesktop.org/patch/701149/?series=160587&rev=3 > > struct xe_svm_iova_cookie { > struct dma_iova_state state; > struct ual_iova_state ual_state; > }; > > static void *xe_drm_pagemap_device_iova_alloc(struct drm_pagemap *dpagemap, > struct device *dev, size_t length, > enum dma_data_direction dir) > { > struct device *pgmap_dev = dpagemap->drm->dev; > struct xe_svm_iova_cookie *cookie; > static bool locking_proved = false; > > xe_drm_pagemap_device_iova_prove_locking(&locking_proved); > > if (pgmap_dev == dev) > return NULL; > > cookie = kzalloc(sizeof(*cookie), GFP_KERNEL); > if (!cookie) > return NULL; > > if (ual_distance(pgmap_dev, dev) < 0) { > dma_iova_try_alloc(dev, &cookie->state, length >= SZ_2M ? SZ_2M > : 0, > length); > if (dma_use_iova(&cookie->state)) > return cookie; > } else { > err = ual_iova_try_alloc(pgmap_dev, &cookie->ual_state, > length >= SZ_2M ? SZ_2M : 0, > length); > if (err) > return ERR_PTR(err); > > if (ual_use_iova(&cookie->state)) s/ual_use_iova(&cookie->state)/ual_use_iova(&cookie->ual_state) > return cookie; > } > > kfree(cookie); > return NULL; > } > > So, here in physical mode - 'ual_use_iova' would return false, true in > virtual. > > This function is also interesting because ual_iova_try_alloc in virtual > mode can allocate memory for PTEs on the target device. This is why the > kernel doc explanation for Context, along with > xe_drm_pagemap_device_iova_prove_locking, is important to ensure that > all the locking is correct. > > Now this function: > > static struct drm_pagemap_addr > xe_drm_pagemap_device_iova_link(struct drm_pagemap *dpagemap, > struct device *dev, struct page *page, > size_t length, size_t offset, void *cookie, > enum dma_data_direction dir) > { > struct device *pgmap_dev = dpagemap->drm->dev; > struct xe_svm_iova_cookie *__cookie = cookie; > struct xe_device *xe = to_xe_device(dpagemap->drm); > enum drm_interconnect_protocol prot; > dma_addr_t addr; > int err; > > if (dma_use_iova(&__cookie->state) { > addr = __cookie->state.addr + offset; > proto = XE_INTERCONNECT_P2P; > err = dma_iova_link(dev, &__cookie->state, > xe_page_to_pcie(page), > offset, length, dir, DMA_ATTR_SKIP_CPU_SYNC > | > DMA_ATTR_MMIO); > } else { > addr = __cookie->ual_state.addr + offset; > proto = XE_INTERCONNECT_VRAM; /* Also means over fabric */ > err = ual_iova_link(dev, &__cookie->ual_state, > xe_page_to_pcie(page), s/xe_page_to_pcie/xe_page_to_dpa Matt > offset, length, dir); > } > if (err) > addr = DMA_MAPPING_ERROR; > > return drm_pagemap_addr_encode(addr, proto, ilog2(length), dir); > } > > Note that the above function can only be called in virtual mode (i.e., > the first function returns an IOVA cookie). Here we’d jam the target’s > PTEs with physical page addresses (reclaim-safe) and return the network > virtual address. > > Lastly a physical UAL example (i.e., first function returns NULL). > > static struct drm_pagemap_addr > xe_drm_pagemap_device_map(struct drm_pagemap *dpagemap, > struct device *dev, > struct page *page, > unsigned int order, > enum dma_data_direction dir) > { > struct device *pgmap_dev = dpagemap->drm->dev; > enum drm_interconnect_protocol prot; > dma_addr_t addr; > > if (pgmap_dev == dev || ual_distance(pgmap_dev, dev) >= 0) { > addr = xe_page_to_dpa(page); > prot = XE_INTERCONNECT_VRAM; > } else { > addr = dma_map_resource(dev, > xe_page_to_pcie(page), > PAGE_SIZE << order, dir, > DMA_ATTR_SKIP_CPU_SYNC); > prot = XE_INTERCONNECT_P2P; > } > > return drm_pagemap_addr_encode(addr, prot, order, dir); > } > > So, if it isn’t clear — these vfuncs hide whether PCIe P2P is being used > (IOMMU in passthrough or enabled) or UAL is being used (physical or > virtual) for DRM common layer. They manage the resources for the > connection and provide the information needed to program the initiator > PTEs (address + “use interconnect” vs. “use PCIe P2P bit”). > > This reasoning is why it would be nice if drivers were allowed to > dma-map IOVA alloc/link/sync/unlink/free API for PCIe P2P directly. > > > My understanding of all the private interconnects is you get an > > interconnect address and program it directly into the device HW, > > possibly with a "use interconnect" bit, and the device never touches > > the PCIe fabric at all. > > > > Yes, but see physical vs virtual explaination. The "use interconnect" is > just one part of this. > > Matt > > > Jason
