Hi Christian,

With the series patch set , amdgpu_vm_validate_pt_bos occasionally evicted 
amdgpu BOs failed and can’t
find the valid first busy bo . Another problem is that  during the first BOs 
get lock period will run into deadlock .

/* check if other user occupy memory too long time */
                if (!first_bo || !request_resv || !request_resv->lock.ctx) {
                        if (first_bo)
                                ttm_bo_put(first_bo);
                        return -EBUSY;
                }
                if (first_bo->resv == request_resv) {
                        ttm_bo_put(first_bo);
                        return -EBUSY;
                }
                if (ctx->interruptible)
                        ret = ww_mutex_lock_interruptible(&first_bo->resv->lock,
                                                          
request_resv->lock.ctx);
                else
                        ret = ww_mutex_lock(&first_bo->resv->lock, 
request_resv->lock.ctx);
                if (ret) {
                        ttm_bo_put(first_bo);
                        if (ret == -EDEADLK) {
                                ret = -EAGAIN;
                        }

                        return ret;
                }

Thanks
Prike

From: Christian König <ckoenig.leichtzumer...@gmail.com>
Sent: Wednesday, May 15, 2019 3:05 PM
To: Liang, Prike <prike.li...@amd.com>; Marek Olšák <mar...@gmail.com>
Cc: Zhou, David(ChunMing) <david1.z...@amd.com>; dri-devel 
<dri-de...@lists.freedesktop.org>; amd-gfx mailing list 
<amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS

[CAUTION: External Email]
Hi Prike,

no, that can lead to massive problems in a real OOM situation and is not 
something we can do here.

Christian.

Am 15.05.19 um 04:00 schrieb Liang, Prike:
Hi Christian ,

I just wonder when encounter ENOMEM error during pin amdgpu BOs can we retry 
validate again as below.
With the following simply patch the Abaqus pinned issue not observed.

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 11cbf63..72a32f5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -902,11 +902,15 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 
domain,
                        bo->placements[i].lpfn = lpfn;
                bo->placements[i].flags |= TTM_PL_FLAG_NO_EVICT;
        }
-
+retry:
        r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
        if (unlikely(r)) {
-               dev_err(adev->dev, "%p pin failed\n", bo);
-               goto error;
+                if (r == -ENOMEM){
+                        goto retry;
+                } else {
+                       dev_err(adev->dev, "%p pin failed\n", bo);
+                       goto error;
+                }
        }

        bo->pin_count = 1;


Thanks,
Prike

From: Marek Olšák <mar...@gmail.com><mailto:mar...@gmail.com>
Sent: Wednesday, May 15, 2019 3:33 AM
To: Christian König 
<ckoenig.leichtzumer...@gmail.com><mailto:ckoenig.leichtzumer...@gmail.com>
Cc: Zhou, David(ChunMing) <david1.z...@amd.com><mailto:david1.z...@amd.com>; 
Liang, Prike <prike.li...@amd.com><mailto:prike.li...@amd.com>; dri-devel 
<dri-de...@lists.freedesktop.org><mailto:dri-de...@lists.freedesktop.org>; 
amd-gfx mailing list 
<amd-gfx@lists.freedesktop.org><mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS

[CAUTION: External Email]
This series fixes the OOM errors. However, if I torture the kernel driver more, 
I can get it to deadlock and end up with unkillable processes. I can also get 
an OOM error. I just ran the test 5 times:

AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & 
AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & 
AMD_DEBUG=testgdsmm glxgears

Marek

On Tue, May 14, 2019 at 8:31 AM Christian König 
<ckoenig.leichtzumer...@gmail.com<mailto:ckoenig.leichtzumer...@gmail.com>> 
wrote:
This avoids OOM situations when we have lots of threads
submitting at the same time.

Signed-off-by: Christian König 
<christian.koe...@amd.com<mailto:christian.koe...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index fff558cf385b..f9240a94217b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
        }

        r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,
-                                  &duplicates, true);
+                                  &duplicates, false);
        if (unlikely(r != 0)) {
                if (r != -ERESTARTSYS)
                        DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
--
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to