On Wed, Jan 28, 2026 at 09:45:40PM +0200, Leon Romanovsky wrote: > On Wed, Jan 28, 2026 at 11:29:23AM -0800, Matthew Brost wrote: > > On Wed, Jan 28, 2026 at 01:55:31PM -0400, Jason Gunthorpe wrote: > > > On Wed, Jan 28, 2026 at 09:46:44AM -0800, Matthew Brost wrote: > > > > > > > It is intended to fill holes. The input pages come from the > > > > migrate_vma_* functions, which can return a sparsely populated array of > > > > pages for a region (e.g., it scans a 2M range but only finds several of > > > > the 512 pages eligible for migration). As a result, if (!page) is true > > > > for many entries. > > > > > > This is migration?? So something is DMA'ing from A -> B - why put > > > holes in the first place? Can you tightly pack the pages in the IOVA? > > > > > > > This could probably could be made to work. I think it would be an > > initial pass to figure out the IOVA size then tightly pack. > > > > Let me look at this. Probably better too as installing dummy pages is a > > non-zero cost as I assume dma_iova_link is a radix tree walk. > > > > > If there is no iommu then the addresses are scattered all over anyhow > > > so it can't be relying on some dma_addr_t relationship? > > > > Scattered dma-addresses is already handled in the copy code, likewise > > holes so non-issue. > > > > > > > > You don't have to fully populate the allocated iova, you can link from > > > A-B and then unlink from A-B even if B is less than the total size > > > requested. > > > > > > The hmm users have the holes because hmm is dynamically > > > adding/removing pages as it runs and it can't do anything to pack the > > > mapping. > > > > > > > > IOVA space? If so, what necessitates those holes? You can have less > > > > > mapped > > > > > than IOVA and dma_iova_*() API can handle it. > > > > > > > > I was actually going to ask you about this, so I’m glad you brought it > > > > up here. Again, this is a hack to avoid holes — the holes are never > > > > touched by our copy function, but rather skipped, so we just jam in a > > > > dummy address so the entire IOVA range has valid IOMMU pages. > > > > > > I would say what you are doing is trying to optimize unmap by > > > > Yes and make the code simplish. > > > > > unmapping everything in one shot instead of just the mapped areas, and > > > the WARN_ON is telling you that it isn't allowed to unmap across a > > > hole. > > > > > > > at the moment I’m not sure whether this warning affects actual > > > > functionality or if we could just delete it. > > > > > > It means the iommu page table stopped unmapping when it hit a hole and > > > there is a bunch of left over maps in the page table that shouldn't be > > > there. So yes, it is serious and cannot be deleted. > > > > > > > Cool, this explains the warning. > > > > > This is a possible option to teach things to detect the holes and > > > ignore them.. > > > > Another option — and IMO probably the best one — as it makes potential > > usages with holes the simplest at the driver level. Let me look at this > > too. > > It would be ideal if we could code a more general solution. In HMM we > release pages one by one, and it would be preferable to have a single-shot > unmap routine instead. In similar to NVMe which release all IOVA space > with one call to dma_iova_destroy(). > > HMM chain: > > ib_umem_odp_unmap_dma_pages() > -> for (...) > -> hmm_dma_unmap_pfn() > > After giving more thought to my earlier suggestion to use > hmm_pfn_to_phys(), I began to wonder why did not you use the > hmm_dma_*() API instead? >
That is ill-suited for high-speed fabrics, but so is our existing implementation — we’re just in slightly better shape (?). It also seems ill-suited [1][2][3] for variable page sizes (which are possible with our API), as well as the way we currently program device PTEs in our driver. We also receive PFNs from the migrate_vma_* layer, which must also be mapped. I also believe the hmm_dma_* code predates the DRM code being merged, or was merged around the same time. We could work to unify the HMM helpers and make them usable, but that won’t happen overnight. The HMM layer needs quite a bit of work to useable, and then we’d have to propagate everything upward through DRM/Xe and any new users. Let me play around with this though a bit though to get rough idea what would need to be done here. [1] https://elixir.bootlin.com/linux/v6.18.6/source/drivers/infiniband/core/umem_odp.c#L255 [2] https://elixir.bootlin.com/linux/v6.18.6/source/drivers/infiniband/core/umem_odp.c#L193 [3] https://elixir.bootlin.com/linux/v6.18.6/source/drivers/infiniband/core/umem_odp.c#L104 Also this is some odd stuff going... Why sync after every mapping [4]. Blindly doing BIDIRECTIONAL [5]... [4] https://elixir.bootlin.com/linux/v6.18.6/source/mm/hmm.c#L826 [5] https://elixir.bootlin.com/linux/v6.18.6/source/mm/hmm.c#L821 > > > > Do you think we need flag somewhere for 'ignore holes' or can I just > > blindly skip them? > > Better if we will have something like dma_iova_with_holes_destroy() > function call to make sure that we don't hurt performance of existing > dma_iova_destroy() users. > Yes, I think this is the best route for the time being. Let me look at this. Matt > Thanks > > > > > Matt > > > > > > > > Jason
