https://bugs.freedesktop.org/show_bug.cgi?id=107762

Michel Dänzer <mic...@daenzer.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ckoenig.leichtzumerken@gmai
                   |                            |l.com, d...@lynxeye.de

--- Comment #2 from Michel Dänzer <mic...@daenzer.net> ---
(In reply to Martin Peres from comment #0)
> [  358.292609] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, 
> signaled seq=137, emitted seq=137
> [  358.292635] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, 
> signaled seq=145, emitted seq=145

(In reply to Martin Peres from comment #1)
> [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled 
> seq=137, emitted seq=137
> [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled 
> seq=147, emitted seq=147

Hmm, signalled and emitted sequence numbers are always the same, meaning the
hardware hasn't actually timed out?

I can think of two possibilities:

* A GPU scheduler bug causing the job timeout handling to be triggered
spuriously. (Could something be stalling the system work queue, so the items
scheduled by drm_sched_job_finish_cb can't call drm_sched_job_finish in time?)

* A problem with the handling of the GPU's interrupts. Do the numbers on the
amdgpu line in /proc/interrupts still increase after these messages appeared,
or at least in the ten seconds before they appear?

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to