On Tue, May 20, 2025 at 12:23 AM Danilo Krummrich <d...@kernel.org> wrote:
>
> On Mon, May 19, 2025 at 10:51:24AM -0700, Rob Clark wrote:
> > From: Rob Clark <robdcl...@chromium.org>
> >
> > See commit a414fe3a2129 ("drm/msm/gem: Drop obj lock in
> > msm_gem_free_object()") for justification.
>
> I asked for a proper commit message in v4.

Sorry, I forgot that, here is what I am adding:

Destroying a GEM object is a special case.  Acquiring the resv lock
when the object is being freed can cause a locking order inversion
between reservation_ww_class_mutex and fs_reclaim.

This deadlock is not actually possible, because no one should be
already holding the lock when free_object() is called.
Unfortunately lockdep is not aware of this detail.  So when the
refcount drops to zero, we pretend it is already locked.

> Only referring to a driver commit and let the people figure out how the driver
> works and what it does in order to motivate a change in the generic
> infrastructure is simply unreasonable.
>
> > Cc: Danilo Krummrich <d...@kernel.org>
> > Signed-off-by: Rob Clark <robdcl...@chromium.org>
> > ---
> >  drivers/gpu/drm/drm_gpuvm.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> > index f9eb56f24bef..1e89a98caad4 100644
> > --- a/drivers/gpu/drm/drm_gpuvm.c
> > +++ b/drivers/gpu/drm/drm_gpuvm.c
> > @@ -1511,7 +1511,9 @@ drm_gpuvm_bo_destroy(struct kref *kref)
> >       drm_gpuvm_bo_list_del(vm_bo, extobj, lock);
> >       drm_gpuvm_bo_list_del(vm_bo, evict, lock);
> >
> > -     drm_gem_gpuva_assert_lock_held(obj);
> > +     if (kref_read(&obj->refcount) > 0)
> > +             drm_gem_gpuva_assert_lock_held(obj);
>
> Again, this is broken. What if the reference count drops to zero right after
> the kref_read() check, but before drm_gem_gpuva_assert_lock_held() is called?

No, it is not.  If you find yourself having this race condition, then
you already have bigger problems.  There are only two valid cases when
drm_gpuvm_bo_destroy() is called.  Either:

1) You somehow hold a reference to the GEM object, in which case the
refcount will be a positive integer.  Maybe you race but on either
side of the race you have a value that is greater than zero.
2) Or, you are calling this in the GEM object destructor path, in
which case no one else should have a reference to the object, so it
isn't possible to race

If the refcount drops to zero after the check, you are about to blow
up regardless.

BR,
-R

> Putting conditionals on a refcount is always suspicious.
>
> If you still really want this, please guard it with
>
>         if (unlikely(gpuvm->flags & DRM_GPUVM_MSM_LEGACY_QUIRK))
>
> and get an explicit waiver from Dave / Sima.
>

Reply via email to