On 2018年09月11日 11:23, Deng, Emily wrote:
-----Original Message-----
From: Zhou, David(ChunMing)
Sent: Tuesday, September 11, 2018 11:03 AM
To: Deng, Emily <emily.d...@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue.



On 2018年09月11日 10:51, Emily Deng wrote:
It will ramdomly have the dead lock issue when test TDR:
1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2.
amdgpu_bo_create locked the bo's resv lock 3. amdgpu_bo_create_shadow
is waiting for the shadow_list_lock 4.
amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv
lock.

v2:
     Make a local copy of the list

Signed-off-by: Emily Deng <emily.d...@amd.com>
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21
++++++++++++++++++++-
   1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2a21267..8c81404 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3105,6 +3105,9 @@ static int
amdgpu_device_handle_vram_lost(struct amdgpu_device *adev)
        long r = 1;
        int i = 0;
        long tmo;
+       struct list_head local_shadow_list;
+
+       INIT_LIST_HEAD(&local_shadow_list);

        if (amdgpu_sriov_runtime(adev))
                tmo = msecs_to_jiffies(8000);
@@ -3112,8 +3115,19 @@ static int
amdgpu_device_handle_vram_lost(struct amdgpu_device *adev)
                tmo = msecs_to_jiffies(100);

        DRM_INFO("recover vram bo from shadow start\n");
+
+       mutex_lock(&adev->shadow_list_lock);
+       list_splice_init(&adev->shadow_list, &local_shadow_list);
+       mutex_unlock(&adev->shadow_list_lock);
+
+
        mutex_lock(&adev->shadow_list_lock);
local_shadow_list is a local variable, I think it doesn't need lock at all, no 
one
change it. Otherwise looks good to me.
The bo->shadow_list which now is in local_shadow_list maybe destroy in case 
that it already in amdgpu_bo_destroy, then it will
change local_shadow_list, so need lock the shadow_list_lock.
Ah, sorry for noise, I forget you don't reference these BOs.

Thanks,
David Zhou
Best wishes
Emily Deng
Thanks,
David Zhou
-       list_for_each_entry_safe(bo, tmp, &adev->shadow_list, shadow_list) {
+       list_for_each_entry_safe(bo, tmp, &local_shadow_list, shadow_list) {
+               mutex_unlock(&adev->shadow_list_lock);
+
+               if (!bo)
+                       continue;
+
                next = NULL;
                amdgpu_device_recover_vram_from_shadow(adev, ring, bo,
&next);
                if (fence) {
@@ -3132,9 +3146,14 @@ static int
amdgpu_device_handle_vram_lost(struct amdgpu_device *adev)

                dma_fence_put(fence);
                fence = next;
+               mutex_lock(&adev->shadow_list_lock);
        }
        mutex_unlock(&adev->shadow_list_lock);

+       mutex_lock(&adev->shadow_list_lock);
+       list_splice_init(&local_shadow_list, &adev->shadow_list);
+       mutex_unlock(&adev->shadow_list_lock);
+
        if (fence) {
                r = dma_fence_wait_timeout(fence, false, tmo);
                if (r == 0)

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to