Hi Christian, With the series patch set , amdgpu_vm_validate_pt_bos occasionally evicted amdgpu BOs failed and can’t find the valid first busy bo . Another problem is that during the first BOs get lock period will run into deadlock .
/* check if other user occupy memory too long time */ if (!first_bo || !request_resv || !request_resv->lock.ctx) { if (first_bo) ttm_bo_put(first_bo); return -EBUSY; } if (first_bo->resv == request_resv) { ttm_bo_put(first_bo); return -EBUSY; } if (ctx->interruptible) ret = ww_mutex_lock_interruptible(&first_bo->resv->lock, request_resv->lock.ctx); else ret = ww_mutex_lock(&first_bo->resv->lock, request_resv->lock.ctx); if (ret) { ttm_bo_put(first_bo); if (ret == -EDEADLK) { ret = -EAGAIN; } return ret; } Thanks Prike From: Christian König <ckoenig.leichtzumer...@gmail.com> Sent: Wednesday, May 15, 2019 3:05 PM To: Liang, Prike <prike.li...@amd.com>; Marek Olšák <mar...@gmail.com> Cc: Zhou, David(ChunMing) <david1.z...@amd.com>; dri-devel <dri-devel@lists.freedesktop.org>; amd-gfx mailing list <amd-...@lists.freedesktop.org> Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS [CAUTION: External Email] Hi Prike, no, that can lead to massive problems in a real OOM situation and is not something we can do here. Christian. Am 15.05.19 um 04:00 schrieb Liang, Prike: Hi Christian , I just wonder when encounter ENOMEM error during pin amdgpu BOs can we retry validate again as below. With the following simply patch the Abaqus pinned issue not observed. diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 11cbf63..72a32f5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -902,11 +902,15 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain, bo->placements[i].lpfn = lpfn; bo->placements[i].flags |= TTM_PL_FLAG_NO_EVICT; } - +retry: r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx); if (unlikely(r)) { - dev_err(adev->dev, "%p pin failed\n", bo); - goto error; + if (r == -ENOMEM){ + goto retry; + } else { + dev_err(adev->dev, "%p pin failed\n", bo); + goto error; + } } bo->pin_count = 1; Thanks, Prike From: Marek Olšák <mar...@gmail.com><mailto:mar...@gmail.com> Sent: Wednesday, May 15, 2019 3:33 AM To: Christian König <ckoenig.leichtzumer...@gmail.com><mailto:ckoenig.leichtzumer...@gmail.com> Cc: Zhou, David(ChunMing) <david1.z...@amd.com><mailto:david1.z...@amd.com>; Liang, Prike <prike.li...@amd.com><mailto:prike.li...@amd.com>; dri-devel <dri-devel@lists.freedesktop.org><mailto:dri-devel@lists.freedesktop.org>; amd-gfx mailing list <amd-...@lists.freedesktop.org><mailto:amd-...@lists.freedesktop.org> Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS [CAUTION: External Email] This series fixes the OOM errors. However, if I torture the kernel driver more, I can get it to deadlock and end up with unkillable processes. I can also get an OOM error. I just ran the test 5 times: AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears Marek On Tue, May 14, 2019 at 8:31 AM Christian König <ckoenig.leichtzumer...@gmail.com<mailto:ckoenig.leichtzumer...@gmail.com>> wrote: This avoids OOM situations when we have lots of threads submitting at the same time. Signed-off-by: Christian König <christian.koe...@amd.com<mailto:christian.koe...@amd.com>> --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index fff558cf385b..f9240a94217b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, } r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true, - &duplicates, true); + &duplicates, false); if (unlikely(r != 0)) { if (r != -ERESTARTSYS) DRM_ERROR("ttm_eu_reserve_buffers failed.\n"); -- 2.17.1 _______________________________________________ amd-gfx mailing list amd-...@lists.freedesktop.org<mailto:amd-...@lists.freedesktop.org> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel