On Wed, 5 Feb 2020 22:31:52 -0700
Aaron Bieber <[email protected]> wrote:
> On Wed, 05 Feb 2020 at 20:29:31 +0100, William Orr wrote:
> >
> > Hey,
> >
> > On recent a snap (04/02/2020), the unpriv'ed process of Xorg seems to hang,
> > becoming totally unresponsive. Running `ktrace` on the process fails to log
> > any output. `top` shows that the process is waiting on `fsleep`. I'm using
> > the
> > amdgpu driver.
>
> Similar issue here. It seems to happen randomly (possibly more often under
> high
> memory usage). It's always after X has started and I have been using it for
> some time (days sometimes).
>
> MPD will continue to play music in the background and pressing the power
> button
> for a few seconds seems to result in a shutdown, however, it doesn't quite
> shutdown properly. The screen will go blank and the fans will start to spin at
> full speed. At which point holding the power button seems to be the only fix.
I studied my problem with startxfce4 (where Xorg gets stuck and I use
Ctrl+Alt+BackSpace to reset Xorg), but that is a different bug,
not an amdgpu glitch.
Today, I froze Xorg in a different way. I was stressing supertuxkart
on my amdgpu machine by playing at full screen (1920x1080), graphics
setting 6, and 19 AI karts. This sometimes causes a visual glitch
where objects in the game either disappear or cast large black shadows.
There is a LOADING screen before each race. The LOADING screen seems
to decide the amount of glitches in each race: none, few, or many.
If I reload the track, I may have more or fewer glitches.
Today, my last race got stuck at the LOADING screen. Xorg stopped
responding to the keyboard: Ctrl+Alt+F4 (to switch virtual console)
didn't work. The system was still alive: ping(8) and ssh(1) continued
to work (from a second computer to the amdgpu machine). In the ssh(1)
session, top(1) showed one thread of supertuxkart being consistently
"onproc" even though the machine was mostly idle. I became root and
attached egdb (from package gdb-7.12.1p9) to supertuxkart.
The thread seemed to be stuck in DRM_IOCTL_AMDGPU_WAIT_CS, called from
/usr/xenocara/lib/libdrm/amdgpu/amdgpu_cs.c; this appears to call
/sys/dev/pci/drm/amd/amdgpu_cs.c amdgpu_cs_wait_ioctl().
I detached egdb, then told top(1) to kill supertuxkart. The system
stopped answering ping(8), and top(1) froze. In top(1), supertuxkart
had WAIT "drmweti" and Xorg had wait "dmafenc". I forced a reboot.
The rest of this mail is a backtrace of one thread of
supertuxkart-0.9.3p0 (copy from photo, so beware of typos). --George
(gdb) bt
#0 ioctl () at -:3
#1 0x000006d86059e3c0 in drmIoctl () from /usr/X11R6/lib/libdrm.so.7.8
#2 0x000006d941e83739 in amdgpu_cs_query_fence_status () from
/usr/X11R6/lib/libdrm_amdgpu.so.1.9
#3 0x000006d8f800e951 in amdgpu_fence_wait () from
/usr/X11R6/lib/modules/dri/radeon_dri.so
#4 0x000006d8f7f448a6 in si_fence_finish () from
/usr/X11R6/lib/modules/dri/radeon_dri.so
#5 0x000006d8f79f04d3 in st_client_wait_sync () from
/usr/X11R6/lib/modules/dri/radeonsi_dri.so
#6 0x000006d8f793136e in _mesa_ClientWaitSync () from
/usr/X11R6/lib/modules/dri/radeonsi_dri.so
#7 0x000006d65d7c48d7 in DrawCalls::prepareDrawCalls(ShadowMatrices&,
irr::scene::ICameraSceneNode
#8 0x000006d65d887aee in ShaderBasedRenderer::renderScene(
irr::scene::ICameraSceneNode*, float, bool, bool) ()
#9 0x000006d65d88a5c3 in ShaderBasedRenderer::render(float) ()
#10 0x000006d65d8068ed in IrrDriver::update(float) ()
#11 0x000006d65d9eaa0d in MainLoop::run() ()
#12 0x000006d65d9e74d0 in main ()
(gdb) info registers
rax 0x36 54
rbx 0x16e2a71b28d7 25162721994967
rcx 0x6d88cf49a3a 7527147543098
rdx 0x7f7ffffdc508 140187732395272
rsi 0xc0206449 3223348297 # DRM_IOCTL_AMDGPU_WAIT_CS
rdi 0x8 8
rbp 0x7f7ffffdc4e0 0x7f7ffffdc4e0
rsp 0x7f7ffffdc4b8 0x7f7ffffdc4b8
r8 0x6d88cf85cf8 7527147789560
r9 0x0 0
r10 0x0 0
r11 0x246 582
r12 0x8 8
r13 0x16e2a71b28d7 25162721994967
r14 0x7f7fffdc508 140187732395272
r15 0xc0206449 3223348297
rip 0x6d88cf49a3a 0x6d88cf49a3a <ioctl+10>