On Tue, May 20, 2025 at 12:23 AM Danilo Krummrich <d...@kernel.org> wrote: > > On Mon, May 19, 2025 at 10:51:24AM -0700, Rob Clark wrote: > > From: Rob Clark <robdcl...@chromium.org> > > > > See commit a414fe3a2129 ("drm/msm/gem: Drop obj lock in > > msm_gem_free_object()") for justification. > > I asked for a proper commit message in v4.
Sorry, I forgot that, here is what I am adding: Destroying a GEM object is a special case. Acquiring the resv lock when the object is being freed can cause a locking order inversion between reservation_ww_class_mutex and fs_reclaim. This deadlock is not actually possible, because no one should be already holding the lock when free_object() is called. Unfortunately lockdep is not aware of this detail. So when the refcount drops to zero, we pretend it is already locked. > Only referring to a driver commit and let the people figure out how the driver > works and what it does in order to motivate a change in the generic > infrastructure is simply unreasonable. > > > Cc: Danilo Krummrich <d...@kernel.org> > > Signed-off-by: Rob Clark <robdcl...@chromium.org> > > --- > > drivers/gpu/drm/drm_gpuvm.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c > > index f9eb56f24bef..1e89a98caad4 100644 > > --- a/drivers/gpu/drm/drm_gpuvm.c > > +++ b/drivers/gpu/drm/drm_gpuvm.c > > @@ -1511,7 +1511,9 @@ drm_gpuvm_bo_destroy(struct kref *kref) > > drm_gpuvm_bo_list_del(vm_bo, extobj, lock); > > drm_gpuvm_bo_list_del(vm_bo, evict, lock); > > > > - drm_gem_gpuva_assert_lock_held(obj); > > + if (kref_read(&obj->refcount) > 0) > > + drm_gem_gpuva_assert_lock_held(obj); > > Again, this is broken. What if the reference count drops to zero right after > the kref_read() check, but before drm_gem_gpuva_assert_lock_held() is called? No, it is not. If you find yourself having this race condition, then you already have bigger problems. There are only two valid cases when drm_gpuvm_bo_destroy() is called. Either: 1) You somehow hold a reference to the GEM object, in which case the refcount will be a positive integer. Maybe you race but on either side of the race you have a value that is greater than zero. 2) Or, you are calling this in the GEM object destructor path, in which case no one else should have a reference to the object, so it isn't possible to race If the refcount drops to zero after the check, you are about to blow up regardless. BR, -R > Putting conditionals on a refcount is always suspicious. > > If you still really want this, please guard it with > > if (unlikely(gpuvm->flags & DRM_GPUVM_MSM_LEGACY_QUIRK)) > > and get an explicit waiver from Dave / Sima. >