On Tue, 2026-03-31 at 11:44 +0200, Christian König wrote:
> On 3/31/26 11:20, Thomas Hellström wrote:
> > The xe driver was using the drm_exec retry pointer directly to
> > restart the locking loop after out-of-memory errors. This is
> > relying on documented behaviour.
> > 
> > Instead add a drm_exec_retry() macro that can be used in this
> > situation, and that also asserts that the struct drm_exec is
> > in a state that is compatible with retrying:
> > Either newly initialized or in a contended state with all locks
> > dropped.
> > 
> > Use that macro in xe.
> > 
> > Signed-off-by: Thomas Hellström <[email protected]>
> > ---
> >  drivers/gpu/drm/xe/xe_validation.h |  2 +-
> >  include/drm/drm_exec.h             | 13 +++++++++++++
> >  2 files changed, 14 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_validation.h
> > b/drivers/gpu/drm/xe/xe_validation.h
> > index a30e732c4d51..4cd955ce6cd2 100644
> > --- a/drivers/gpu/drm/xe/xe_validation.h
> > +++ b/drivers/gpu/drm/xe/xe_validation.h
> > @@ -146,7 +146,7 @@ bool xe_validation_should_retry(struct
> > xe_validation_ctx *ctx, int *ret);
> >  #define xe_validation_retry_on_oom(_ctx,
> > _ret)                               \
> >     do
> > {                                                           \
> >             if (xe_validation_should_retry(_ctx,
> > _ret))              \
> > -                   goto
> > *__drm_exec_retry_ptr;                      \
> > +                   drm_exec_retry((_ctx)-
> > >exec);                     \
> 
> Oh, that goto is extremely questionable to begin with.
> 
> >     } while (0)
> >  
> >  /**
> > diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
> > index fc95a979e253..5ed5be1f8244 100644
> > --- a/include/drm/drm_exec.h
> > +++ b/include/drm/drm_exec.h
> > @@ -138,6 +138,19 @@ static inline bool
> > drm_exec_is_contended(struct drm_exec *exec)
> >     return !!exec->contended;
> >  }
> >  
> > +/**
> > + * drm_exec_retry() - Unconditionally restart the loop to grab all
> > locks.
> > + * @exec: drm_exec object
> > + *
> > + * Unconditionally retry the loop to lock all objects. For
> > consistency,
> > + * the exec object needs to be newly initialized or contended.
> > + */
> > +#define drm_exec_retry(_exec)                              \
> > +   do {                                            \
> > +           WARN_ON(!drm_exec_is_contended(_exec)); \
> 
> This warning would trigger!
> 
> See the code in xe_bo_notifier_prepare_pinned() for example:
> 
>                         drm_exec_retry_on_contention(&exec);
>                         ret = PTR_ERR(backup);
>                         xe_validation_retry_on_oom(&ctx, &ret);
> 
> Without contention we would just skip the loop and never lock
> anything.
> 
> What XE does here just doesn't work as far as I can see.

So if the xe_validation_retry_on_oom() is actually retrying it
internally call drm_exec_fini() and drm_exec_init() first, which means
that the warning doesn't trigger, due to the dummy value of contended.

So the warning does its job, and xe is safe.

Thanks,
Thomas



> 
> Regards,
> Christian.
> 
> > +           goto *__drm_exec_retry_ptr;             \
> > +   } while (0)
> > +
> >  void drm_exec_init(struct drm_exec *exec, u32 flags, unsigned nr);
> >  void drm_exec_fini(struct drm_exec *exec);
> >  bool drm_exec_cleanup(struct drm_exec *exec);

Reply via email to