On Sun, Jan 23, 2022 at 09:20:20AM +0000, Laurence Tratt wrote:
> I've had a Ryzen machine with a (basic!) Polaris GPU for about a year. Over
> that time nearly all of the GPU related bugs have disappeared (thanks
> Jonathan et al.!), except for the fact that I don't seem to be able to
> reliably suspend/resume from X.
> 
> Resuming used to crash the machine 9 times out of 10, generally causing a
> single block of colour to be displayed, before automatically rebooting
> itself.
> 
> As of the last couple of weeks or so (at least), things are a little better.
> I now seem to be able to semi-reliably suspend/resume from the console. When
> I resume an odd white-ish bit-pattern is displayed on screen, but switching
> to/from another console restores things. I seem to be able to do this
> multiple times without issue.
> 
> However, if I suspend/resume from X, the X server soon gets visually
> corrupted -- the white-ish bit pattern is mingled with whatever was last
> visible in my X session. The machine still responds to ssh (etc.) but the
> screen is no longer usable, even if I try switching to/from virtual
> consoles. [Perhaps unsurprisingly, even if I don't login to X, merely
> suspending/resuming from the console seems to corrupt xenodm after a while.]

Does hibernate behaviour differ?

> 
> I now seem to have triggered some sort of output from dmesg after a recent
> suspend/resume sequence:
> 
>   drm:pid33800:gmc_v8_0_process_interrupt *ERROR* GPU fault detected: 146 
> 0x0448480c for process  pid 0 thread Xorg pid 21509
>   drm:pid33800:gmc_v8_0_process_interrupt *ERROR*   
> VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00102E89
>   drm:pid33800:gmc_v8_0_process_interrupt *ERROR*   
> VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E04800C
>   drm:pid33800:gmc_v8_0_vm_decode_fault *ERROR* VM fault (0x0c, vmid 7, pasid 
> 32769) at page 1060489, read from 'TC0' (0x54433000) (72)
> 
> Does that mean anything? I have no idea, but I thought it might be worth
> noting it here in case it helps any amdgpu hackers! This is on amd64
> -current as of a couple of days ago with amdgpu-firmware-20211027. I have
> fiddled with just about every BIOS setting I can think of (e.g. TPM, secure
> boot), but they have no apparent effect on this issue. Full dmesg below.

gpu faults tend to be problems in Mesa like this
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14409
polaris is gfx8 but I'm not sure how relevant that particular fix is
(see the table in https://www.x.org/wiki/RadeonFeature/ )

I have a Mesa update to 21.3 planned but am waiting till a bit after
the recent kernel changes.

Reply via email to