Re: [PATCH] drm/amdgpu:fix gpu recover missing skipping

Christian König Wed, 08 Nov 2017 01:44:25 -0800

Am 08.11.2017 um 07:39 schrieb Monk Liu:

if app close CTX right after IB submit, gpu recover
will failed to find out the entity/ctx behind the guilty
job thus lead to bad job skipping in scheduler failed


to fix this corner case just move the job->karma increasing
out of the condition that the backing entity was found
that way the job itself will be "guilty" anyway

Change-Id: Ia30f02df9297a343d6d8dace496e237827dd1548
Signed-off-by: Monk Liu <[email protected]>


Reviewed-by: Christian König <[email protected]>

---
  drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
index 7aa6455..720fd1b 100644
--- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
@@ -464,6 +464,7 @@ void amd_sched_hw_job_reset(struct amd_gpu_scheduler 
*sched, struct amd_sched_jo
        spin_unlock(&sched->job_list_lock);

if (bad) {

+               atomic_inc(&bad->karma);
                /* don't increase @bad's karma if it's from KERNEL RQ,
                 * becuase sometimes GPU hang would cause kernel jobs (like VM 
updating jobs)
                 * corrupt but keep in mind that kernel jobs always considered 
good.
@@ -474,7 +475,7 @@ void amd_sched_hw_job_reset(struct amd_gpu_scheduler 
*sched, struct amd_sched_jo
                        spin_lock(&rq->lock);
                        list_for_each_entry_safe(entity, tmp, &rq->entities, 
list) {
                                if (bad->s_fence->scheduled.context == 
entity->fence_context) {
-                                   if (atomic_inc_return(&bad->karma) > 
bad->sched->hang_limit)
+                                   if (atomic_read(&bad->karma) > 
bad->sched->hang_limit)
                                                if (entity->guilty)
                                                        
atomic_set(entity->guilty, 1);
                                        break;



_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu:fix gpu recover missing skipping

Reply via email to