xe: Retry migration once

Matthew Brost Tue, 28 Oct 2025 12:04:16 -0700

On Tue, Oct 28, 2025 at 10:11:18AM +0100, Thomas Hellström wrote:
> Hi, Matt
> 
> On Mon, 2025-10-27 at 17:13 -0700, Matthew Brost wrote:
> > On Sat, Oct 25, 2025 at 02:04:12PM +0200, Thomas Hellström wrote:
> > > Data present in foreign device memory may cause migration to fail.
> > > For now, retry once after first migrating to system.
> > > 
> > > Signed-off-by: Thomas Hellström <[email protected]>
> > > ---
> > >  drivers/gpu/drm/xe/xe_svm.c | 19 +++++++++++++++----
> > >  1 file changed, 15 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_svm.c
> > > b/drivers/gpu/drm/xe/xe_svm.c
> > > index 9814f95cb212..41e075aa015c 100644
> > > --- a/drivers/gpu/drm/xe/xe_svm.c
> > > +++ b/drivers/gpu/drm/xe/xe_svm.c
> > > @@ -1529,13 +1529,24 @@ struct drm_pagemap
> > > *xe_vma_resolve_pagemap(struct xe_vma *vma, struct xe_tile *t
> > >  int xe_svm_alloc_vram(struct xe_svm_range *range, const struct
> > > drm_gpusvm_ctx *ctx,
> > >                 struct drm_pagemap *dpagemap)
> > >  {
> > > + int err, retries = 1;
> > > +
> > >   xe_assert(range_to_vm(&range->base)->xe, range-
> > > >base.pages.flags.migrate_devmem);
> > >   range_debug(range, "ALLOCATE VRAM");
> > >  
> > > - return drm_pagemap_populate_mm(dpagemap,
> > > xe_svm_range_start(range),
> > > -                                xe_svm_range_end(range),
> > > -                                range->base.gpusvm->mm,
> > > -                                ctx->timeslice_ms);
> > > +retry:
> > > + err = drm_pagemap_populate_mm(dpagemap,
> > > xe_svm_range_start(range),
> > > +                               xe_svm_range_end(range),
> > > +                               range->base.gpusvm->mm,
> > > +                               ctx->timeslice_ms);
> > > + if ((err == -EBUSY || err == -EFAULT) && retries--) {
> > 
> > I don't think this is what we want to do here. -EFAULT indicates that
> > the pages are entirely present somewhere in device memory. This could
> > be
> > either on the local device or on a foreign device, but we don’t have
> > enough information here to determine which case it is.
> > 
> > If this is on our local device, we're always good. This could occur
> > playing mremap games.
> > 
> > If it's on a foreign device, things get trickier. If our interconnect
> > supports atomics we're still good. But if the interconnect
> > doesn't support atomics (e.g., PCIe P2P), this an atomic fault, then
> > we
> > need to move the memory. Also, if there's no path between device
> > memories, then of course we need to move the memory.
> > 
> > Again, we don’t have enough information here to make the correct
> > decision. We really need to call drm_gpusvm_range_get_pages to gather
> > the CPU pages in order to make this kind of decision. Ideally, the
> > logic
> > should be built into drm_gpusvm_range_get_pages to understand atomic
> > migration requirements.
> 
> For multi-device I'm just looking at a patch that considers p2p
> migration and at the same time returns 0 if data is placed in
> compatible memory, given migration policies and interconnects. But
> until that patch lands, we need a way to evict memory from foreign
> devices so that we can migrate to the desired device.
> 
> I would have expected that if memory is already present in local device
> memory we'd have that xe_svm_range_in_vram() flag set and would not
> attempt to migrate, at least in most cases? Currently, if data is


We check whether a range has valid GPU mappings and skip the fault if
that’s the case. However, if a user performs an mremap while a GPU
mapping is present, the SVM range becomes new, but the CPU pages remain
valid. In this flow, the VRAM allocation fails, and get_pages correctly
locates the pages.

This situation is similar to a multi-device scenario, where another GPU
fault handler has already moved the CPU pages to the correct location.
We could add an additional step before attempting to allocate VRAM that
detects this condition—for example, by calling get_pages with specific
arguments. For what it's worth, get_pages is a very lightweight
function; if I recall correctly, its overhead is less than 0.05
microseconds.

> already fully or partly present in another p2p-device memory, and we
> ignore the -EFAULT, then get_pages() wouldn't detect that? Or well, we
> can look at the dpagemap returned from get_pages and retry the
> migration at that point.
> 
> We also need to realize that with multi-gpu, the chances of migration
> races increases dramatically and whether those return -EBUSY or -EFAULT
> appears a bit arbitrary to me? We can't really assume that cpages == 0
> means all pages are already where they are supposed to be.
> 
> My current thinking how to handle all this is the following:
> 
> 1) xe_svm_range_in_vram(), first check to avoid migration.
> 2) (NEW, not implemented yet) if we decide to migrate, first run a
> hmm_range_fault() without faulting flag to determine current memory
> migration status - Perhaps optional. This may reject migration more
> efficiently than if we collect pages for migration and then inspect
> them, because then we've already sent an invalidation event.

Yes, I think parts 1 and 2 are probably variations of the get_pages
function mentioned above.

> 3) Call into drm_pagemap_populate_mm(). This collects all compatible-
> and system pages, and determines what to migrate. If no migration
> needed, returns 0. If racing or needing to migrate foreign devices to
> system, return -EBUSY,
> 4) If -EBUSY evict, and retry migration once.
> 
> For now, I think we make atomic faults use local VRAM only. Moving fow
> 

Maybe that makes sense, but drm_gpusvm_pages also includes drm_pagemap,
so adding logic to determine whether atomics are allowed shouldn't be
difficult either. We need to forward on this information to the xe_pt.c
layer to correctly set the atomic enable bit.

> > 
> > Once drm_gpusvm_range_get_pages returns, we can take appropriate
> > action.
> > Initially, for simplicity, this might just be a bounce to system
> > memory.
> > Later, it could evolve into a direct device-to-device move.
> 
> I agree we need a pass with hmm_range_fault(), question is in what
> order we do this. I think a pass without the fault flag on before
> trying to migrate would
> 
> a) Avoid populating with system pages for data that's going to be in
> VRAM anyway,

Agreed. Whether CPU pages are present or not can dramatically affect
migration time, so it's best to avoid faulting pages that can be moved
immediately.

> b) Possibly avoiding collecting migrate pages and thus also an
> invalidation for all devices.
> 

Yes, this is a higher overhead call hmm_range_fault.

> The drawback is we'd might unnecessarily run a non-faulting
> hmm_range_fault() when we need to migrate anyway. My thinking is, that
> would be a rather quick call, though compared to the reverse lookups in
> the migrate code.
> 

Like I said, hmm_range_fault is quite fast—especially once we get 2MB
device pages—so I'm not too concerned about the overhead.

> > 
> > The logic inside drm_gpusvm_range_get_pages would likely involve
> > devmem_only combined with a drm_pagemap passed in, which can detect
> > connectivity and atomic support between devices—based on the
> > drm_pagemap
> > extracted from the ZDD.
> > 
> > Let know if thia makes sense, or if you have thought about doing this
> > in
> > a follow up.
> 
> In any case I think we need to set up a flow-chart / flow-list similar
> to the above and consider the most important cases and what to do with
> them. for now, I think we can replace this patch if necessary with a
> dpagemap check, whether desired equals what's present and rerun after
> that. We'd probably need that anyway.
> 

Yes, some flowcharts for our GPU fault handler would be helpful. It's
already quite complex, and multi-GPU support adds even more complexity.
We have simple examples in the GPU SVM documentation, but perhaps we
could put together a more detailed kernel doc in xe_svm.c that outlines
various scenarios and what we expect to happen in each.

Matt

> Thomas
> 
> 
> 
> 
> > 
> > Matt
> > 
> > > +         range_debug(range, "ALLOCATE VRAM - Retry.");
> > > +
> > > +         drm_gpusvm_range_evict(range->base.gpusvm, &range-
> > > >base);
> > > +         goto retry;
> > > + }
> > > +
> > > + return err;
> > >  }
> > >  
> > >  static struct drm_pagemap_addr
> > > -- 
> > > 2.51.0
> > > 
>

Re: [PATCH 15/15] drm/xe: Retry migration once

Reply via email to