That change is indeed fixing a problem. When drm_sched_job_recovery() is called s_job->entity should already be NULL.

And even when the process is killed we should still either re-submit the jobs or set the error.

BTW: There is another bug in that function: guilty_context needs to be kept alive over the course of a loop.

Going to submit a patch for that.

Regards,
Christian.

Am 26.04.2018 um 06:23 schrieb zhoucm1:
NAK on it.

First of all, without this patch, Does it cause any issue?

second,

entity->last_scheduled present the last submiting job.

Your this change will break this meaning and don't work, e.g.

1. mirror list has jobA and jobB, assuming they are belonged to same entity, then the entity->last_scheduled is jobB->finished.

2. when you do recovery, re-submit jobA first, if don't update last_scheduled, then entity->last_scheduled still is jobB->finished.

3. killed this process, will call to drm_sched_entity_cleanup, will register drm_sched_entity_kill_jobs_cb to jobB->finished, but jobB isn't submitted at all.


So the change isn't necessary at all.


Regards,

David Zhou


On 2018年04月26日 11:47, Ding, Pixel wrote:
Hi Monk,

Please review it. Thanks.

—
Sincerely Yours,
Pixel


On 2018/4/25, 4:39 PM, "Pixel Ding" <pixel.d...@amd.com> wrote:

     The current sequence in scheduler thread is:
     1. update last sched fence
     2. job begin (adding to mirror list)
     3. job finish (remove from mirror list)
     4. back to 1
          Since we update last sched prior to joining mirror list, the jobs
     in mirror list already pass the last sched fence. TDR just run
     the jobs in mirror list, so we should not update the last sched
     fences in TDR.
          Signed-off-by: Pixel Ding <pixel.d...@amd.com>
     ---
      drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
      1 file changed, 3 deletions(-)
          diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
     index 088ff2b..1f1dd70 100644
     --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
     +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
     @@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct drm_gpu_scheduler *sched)
              fence = sched->ops->run_job(s_job);
              atomic_inc(&sched->hw_rq_count);
           - dma_fence_put(s_job->entity->last_scheduled);
     -        s_job->entity->last_scheduled = dma_fence_get(&s_fence->finished);
     -
              if (fence) {
                  s_fence->parent = dma_fence_get(fence);
                  r = dma_fence_add_callback(fence, &s_fence->cb,
     --
     2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to