On 02/13/2017 05:25 PM, Nicolai Hähnle wrote:
On 09.02.2017 11:33, Samuel Pitoiset wrote:
When ttm_bo_init() fails, the reservation mutex should be unlocked.

In debug build, the kernel reported "possible recursive locking
detected" in this codepath. For debugging purposes, I also added
a "WARN_ON(ww_mutex_is_locked())" when ttm_bo_init() fails and the
mutex was locked as expected.

This should fix (random) GPU hangs. The easy way to reproduce the
issue is to change the "Super Sampling" option from 1.0 to 2.0 in
Hitman. It will create a huge buffer, evict a bunch of buffers
(around ~5k) and deadlock.

This regression has been introduced pretty recently.

v2: only release the mutex if resv is NULL

Fixes: 12a852219583 ("drm/amdgpu: improve
AMDGPU_GEM_CREATE_VRAM_CLEARED handling (v2)")
Signed-off-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index d1ef1d064de4..556236a112c1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -403,8 +403,11 @@ int amdgpu_bo_create_restricted(struct
amdgpu_device *adev,
             &bo->placement, page_align, !kernel, NULL,
             acc_size, sg, resv ? resv : &bo->tbo.ttm_resv,
             &amdgpu_ttm_bo_destroy);
-    if (unlikely(r != 0))
+    if (unlikely(r != 0)) {
+        if (!resv)
+            ww_mutex_unlock(&bo->tbo.resv->lock);
         return r;
+    }

I was looking at this myself a couple of weeks back, and I'm pretty sure
I had this exact same patch just to realize that it's actually incorrect.

The problem is that ttm_bo_init will actually call the destroy function
(in our case, amdgpu_ttm_bo_destroy), so at this point, bo has been freed.

This code is a huge mess. I'm surprised though: have you verified that
this patch actually fixes a hang?

Yes, I triple-checked. I can't reproduce the hangs with Hitman.

This fixes a deadlock, here's the report:
https://hastebin.com/durodivoma.xml

The resv->lock has to be unlocked when ttm_bo_init() fails (I checked with a WARN_ON(is_locked)) because it doesn't call the destroy function in all situations. Presumably, when drm_vma_offset_add() fails and resv is not NULL, the mutex is not unlocked.


Cheers,
Nicolai



     bo->tbo.priority = ilog2(bo->tbo.num_pages);
     if (kernel)


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to