Am 13.10.2017 um 16:34 schrieb Michel Dänzer:
On 12/10/17 07:11 PM, Christian König wrote:
Am 12.10.2017 um 18:49 schrieb Michel Dänzer:
On 12/10/17 01:00 PM, Michel Dänzer wrote:
[0] I also got this, but I don't know yet if it's related:
No, that seems to be a separate issue; I can still reproduce it with the
huge page related changes reverted. Unfortunately, it doesn't seem to
happen reliably on every piglit run.
Can you enable KASAN in your kernel,
KASAN caught something else at the beginning of piglit, see the attached
dmesg excerpt. Not sure it's related though.

amdgpu_job_free_cb+0x13d/0x160 decodes to:

amd_sched_get_job_priority at 
.../drivers/gpu/drm/amd/amdgpu/../scheduler/gpu_scheduler.h:182

static inline enum amd_sched_priority
amd_sched_get_job_priority(struct amd_sched_job *job)
{
        return (job->s_entity->rq - job->sched->sched_rq); <===
}

  (inlined by) amdgpu_job_free_cb at 
.../drivers/gpu/drm/amd/amdgpu/amdgpu_job.c:107

        amdgpu_ring_priority_put(job->ring, amd_sched_get_job_priority(s_job));

Sounds a lot like the code Andres added is buggy somehow. Going to take a look as well.

and please look up at which line number amdgpu_vm_bo_invalidate+0x88
is.
Looks like it's this line:

                if (evicted && bo->tbo.resv == vm->root.base.bo->tbo.resv) {

Maybe vm or vm->root.base.bo is NULL?
Ah, of course!

We need to reserve the page directory root when we release it or otherwise we can run into a race with somebody else trying to evict it.

Going to send a patch in a minute,
Christian.
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to