On Sun, Jan 23, 2022 at 09:20:20AM +0000, Laurence Tratt wrote: > I've had a Ryzen machine with a (basic!) Polaris GPU for about a year. Over > that time nearly all of the GPU related bugs have disappeared (thanks > Jonathan et al.!), except for the fact that I don't seem to be able to > reliably suspend/resume from X. > > Resuming used to crash the machine 9 times out of 10, generally causing a > single block of colour to be displayed, before automatically rebooting > itself. > > As of the last couple of weeks or so (at least), things are a little better. > I now seem to be able to semi-reliably suspend/resume from the console. When > I resume an odd white-ish bit-pattern is displayed on screen, but switching > to/from another console restores things. I seem to be able to do this > multiple times without issue. > > However, if I suspend/resume from X, the X server soon gets visually > corrupted -- the white-ish bit pattern is mingled with whatever was last > visible in my X session. The machine still responds to ssh (etc.) but the > screen is no longer usable, even if I try switching to/from virtual > consoles. [Perhaps unsurprisingly, even if I don't login to X, merely > suspending/resuming from the console seems to corrupt xenodm after a while.]
Does hibernate behaviour differ? > > I now seem to have triggered some sort of output from dmesg after a recent > suspend/resume sequence: > > drm:pid33800:gmc_v8_0_process_interrupt *ERROR* GPU fault detected: 146 > 0x0448480c for process pid 0 thread Xorg pid 21509 > drm:pid33800:gmc_v8_0_process_interrupt *ERROR* > VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00102E89 > drm:pid33800:gmc_v8_0_process_interrupt *ERROR* > VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E04800C > drm:pid33800:gmc_v8_0_vm_decode_fault *ERROR* VM fault (0x0c, vmid 7, pasid > 32769) at page 1060489, read from 'TC0' (0x54433000) (72) > > Does that mean anything? I have no idea, but I thought it might be worth > noting it here in case it helps any amdgpu hackers! This is on amd64 > -current as of a couple of days ago with amdgpu-firmware-20211027. I have > fiddled with just about every BIOS setting I can think of (e.g. TPM, secure > boot), but they have no apparent effect on this issue. Full dmesg below. gpu faults tend to be problems in Mesa like this https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14409 polaris is gfx8 but I'm not sure how relevant that particular fix is (see the table in https://www.x.org/wiki/RadeonFeature/ ) I have a Mesa update to 21.3 planned but am waiting till a bit after the recent kernel changes.
