On 13.10.2017 10:46, Liu, Monk wrote:
Just revert Nicolai’s patch,if other routine want to reference s_fence, it should get the finished fence in the first place/time,

For gpu_reset routine, it refers to s_fence only on those unfinished job in sched_hw_job_reset, so totally safe to refer to s_fence pointer

I wonder what issue Nicolai met with and submitted this patch

The original motivation of my patch was to catch accidental use of job->s_fence after the fence was destroyed in amd_sched_process_job. Basically, prevent a dangling pointer. I still think that's a reasonable idea, though clearly my first attempt at it was just wrong.

In Christian's v2 patch, it might make sense to add

        spin_unlock(&sched->job_list_lock);
+       dma_fence_put(&s_job->s_fence->finished);
+       s_job->s_fence = NULL;
        sched->ops->free_job(s_job);

... though I'm not 100% certain about how the fence lifetimes work.

Cheers,
Nicolai



BR Monk

*From:*amd-gfx [mailto:[email protected]] *On Behalf Of *Liu, Monk
*Sent:* 2017年10月13日16:40
*To:* Koenig, Christian <[email protected]>; Nicolai Hähnle <[email protected]>; [email protected]
*Subject:* RE: regression with d6c650c0a8f6f671e49553725e1db541376d95f2

I doubt it would always work fine…

First, we have FENCE_TRACE reference s_fence->finished after “fence_signal(&fence->finished)”

Second, we have trace_amd_sched_proess_job(s_fence) after “amd_sched_fence_finished()”,

If you put the finished before free_job() and by coincidence the job_finish() get very soon executed you’ll have odds to hit wild pointer on above two cases

BR Monk

*From:*Koenig, Christian
*Sent:* 2017年10月13日16:17
*To:* Liu, Monk <[email protected] <mailto:[email protected]>>; Nicolai Hähnle <[email protected] <mailto:[email protected]>>; [email protected] <mailto:[email protected]>
*Subject:* Re: regression with d6c650c0a8f6f671e49553725e1db541376d95f2

Yeah, that change is actually incorrect and should be reverted.

What we really need to do is remove dropping sched_job->s_fence from amd_sched_process_job() into amd_sched_job_finish() directly before the call to free_job().

Regards,
Christian.

Am 13.10.2017 um 09:24 schrieb Liu, Monk:

    commit d6c650c0a8f6f671e49553725e1db541376d95f2
    Author: Nicolai Hähnle <[email protected]>
    <mailto:[email protected]>
    @@ -611,6 +611,10 @@ static int amd_sched_main(void *param)

                     fence = sched->ops->run_job(sched_job);
                     amd_sched_fence_scheduled(s_fence);
    +
    +               /* amd_sched_process_job drops the job's reference
    of the fence. */
    +               sched_job->s_fence = NULL;
    +
                     if (fence) {
                             s_fence->parent = dma_fence_get(fence);
                             r = dma_fence_add_callback(fence, &s_fence->cb,

    Hi Nicolai

    with this patch, you will break "amdgpu_sched_hw_job_reset()"routine:

    void

    amd_sched_hw_job_reset(structamd_gpu_scheduler

    *sched)

    {

    structamd_sched_job

    *s_job;

    spin_lock(&sched->job_list_lock);

    list_for_each_entry_reverse(s_job,

    &sched->ring_mirror_list, node) {

    if(s_job->s_fence->parent

    &&

    fence_remove_callback(s_job->s_fence->parent,

                          &s_job->s_fence->cb))

    {

    fence_put(s_job->s_fence->parent);

                 s_job->s_fence->parent

    =

    NULL;

    atomic_dec(&sched->hw_rq_count);

             }

         }

    spin_unlock(&sched->job_list_lock);

    }

    see that without sched_job->s_fence, you cannot remove the callback
    from its hw fence,

    any idea??

    BR Monk



--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to