** Changed in: linux
       Status: Unknown => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1995956

Title:
  amdgpu no-retry page fault in Kinetic Kudu

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Kinetic:
  Triaged

Bug description:
  When using Skype in snap, amdgpu crashed, resulting in black screen
  and unresponsive system.

  Happened on Kinetic Kudu 5.19.0-23-generic with or without latest amdgpu 
firmware.
  Affected laptop is T14 with Ryzen 5850U.

  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu: [mmhub0] no-retry 
page fault (src_id:0 ring:40 vmid:5 pasid:0, for process  pid 0 thread  pid 0)
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:   in page 
starting at address 0x000080010142c000 from IH client 0x12 (VMC)
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00540051
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      Faulty UTCL2 
client ID: MP1 (0x0)
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      MORE_FAULTS: 
0x1
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      
WALKER_ERROR: 0x0
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      
PERMISSION_FAULTS: 0x5
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      
MAPPING_ERROR: 0x0
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      RW: 0x1
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu: [mmhub0] no-retry 
page fault (src_id:0 ring:40 vmid:5 pasid:0, for process  pid 0 thread  pid 0)
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:   in page 
starting at address 0x000080010142d000 from IH client 0x12 (VMC)
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      Faulty UTCL2 
client ID: MP1 (0x0)
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      MORE_FAULTS: 
0x0
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      
WALKER_ERROR: 0x0
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      
PERMISSION_FAULTS: 0x0
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      
MAPPING_ERROR: 0x0
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      RW: 0x0
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu: [mmhub0] no-retry 
page fault (src_id:0 ring:40 vmid:5 pasid:0, for process  pid 0 thread  pid 0)
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:   in page 
starting at address 0x000080010142c000 from IH client 0x12 (VMC)
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00540051
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      Faulty UTCL2 
client ID: MP1 (0x0)
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      MORE_FAULTS: 
0x1
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      
WALKER_ERROR: 0x0
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      
PERMISSION_FAULTS: 0x5
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      
MAPPING_ERROR: 0x0
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      RW: 0x1
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu: [mmhub0] no-retry 
page fault (src_id:0 ring:40 vmid:5 pasid:0, for process  pid 0 thread  pid 0)
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:   in page 
starting at address 0x000080010142d000 from IH client 0x12 (VMC)
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      Faulty UTCL2 
client ID: MP1 (0x0)
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      MORE_FAULTS: 
0x0
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      
WALKER_ERROR: 0x0
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      
PERMISSION_FAULTS: 0x0
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      
MAPPING_ERROR: 0x0
  Nov 03 16:35:44 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      RW: 0x0

  
  This happens in a loop and eventually leads to GPU reset, which fails.

  Nov 03 16:35:55 laptop kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* 
ring sdma0 timeout, signaled seq=211509, emitted seq=211512
  Nov 03 16:35:55 laptop kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* 
Process information: process skypeforlinux pid 154554 thread skypeforli:cs0 pid 
154558
  Nov 03 16:35:55 laptop kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset begin!
  Nov 03 16:35:55 laptop kernel: [drm] free PSP TMR buffer
  Nov 03 16:35:55 laptop kernel: CPU: 15 PID: 141579 Comm: kworker/u32:1 
Tainted: G        W         5.19.0-23-generic #24-Ubuntu
  Nov 03 16:35:55 laptop kernel: Hardware name: LENOVO 20XK002HPB/20XK002HPB, 
BIOS R1MET49W (1.19 ) 06/27/2022
  Nov 03 16:35:55 laptop kernel: Workqueue: amdgpu-reset-dev 
drm_sched_job_timedout [gpu_sched]
  Nov 03 16:35:55 laptop kernel: Call Trace:
  Nov 03 16:35:55 laptop kernel:  <TASK>
  Nov 03 16:35:55 laptop kernel:  show_stack+0x4e/0x61
  Nov 03 16:35:55 laptop kernel:  dump_stack_lvl+0x4a/0x6d
  Nov 03 16:35:55 laptop kernel:  dump_stack+0x10/0x18
  Nov 03 16:35:55 laptop kernel:  amdgpu_do_asic_reset+0x2b/0x45e [amdgpu]
  Nov 03 16:35:55 laptop kernel:  
amdgpu_device_gpu_recover_imp.cold+0x748/0x7f0 [amdgpu]
  Nov 03 16:35:55 laptop kernel:  amdgpu_job_timedout+0x196/0x1d0 [amdgpu]
  Nov 03 16:35:55 laptop kernel:  ? finish_task_switch.isra.0+0x85/0x290
  Nov 03 16:35:55 laptop kernel:  drm_sched_job_timedout+0x70/0x120 [gpu_sched]
  Nov 03 16:35:55 laptop kernel:  process_one_work+0x225/0x400
  Nov 03 16:35:55 laptop kernel:  worker_thread+0x50/0x3e0
  Nov 03 16:35:55 laptop kernel:  ? rescuer_thread+0x3c0/0x3c0
  Nov 03 16:35:55 laptop kernel:  kthread+0xe9/0x110
  Nov 03 16:35:55 laptop kernel:  ? kthread_complete_and_exit+0x20/0x20
  Nov 03 16:35:55 laptop kernel:  ret_from_fork+0x22/0x30
  Nov 03 16:35:55 laptop kernel:  </TASK>
  Nov 03 16:35:55 laptop kernel: amdgpu 0000:07:00.0: amdgpu: MODE2 reset
  Nov 03 16:35:55 laptop kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset 
succeeded, trying to resume
  Nov 03 16:35:55 laptop kernel: [drm] PCIE GART of 1024M enabled.
  Nov 03 16:35:55 laptop kernel: [drm] PTB located at 0x000000F400900000
  Nov 03 16:35:55 laptop kernel: [drm] VRAM is lost due to GPU reset!
  Nov 03 16:35:55 laptop kernel: [drm] PSP is resuming...
  Nov 03 16:35:55 laptop kernel: [drm] reserve 0x400000 from 0xf43f800000 for 
PSP TMR
  Nov 03 16:35:55 laptop kernel: amdgpu 0000:07:00.0: amdgpu: RAS: optional ras 
ta ucode is not available
  Nov 03 16:35:55 laptop kernel: amdgpu 0000:07:00.0: amdgpu: RAP: optional rap 
ta ucode is not available
  Nov 03 16:35:55 laptop kernel: amdgpu 0000:07:00.0: amdgpu: SECUREDISPLAY: 
securedisplay ta ucode is not available
  Nov 03 16:35:55 laptop kernel: amdgpu 0000:07:00.0: amdgpu: SMU is resuming...
  Nov 03 16:35:55 laptop kernel: amdgpu 0000:07:00.0: amdgpu: SMU is resumed 
successfully!
  Nov 03 16:35:55 laptop kernel: [drm] DMUB hardware initialized: 
version=0x0101001F
  Nov 03 16:35:56 laptop kernel: [drm] kiq ring mec 2 pipe 1 q 0
  Nov 03 16:35:56 laptop kernel: amdgpu 0000:07:00.0: 
[drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma0 test failed (-110)
  Nov 03 16:35:56 laptop kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] 
*ERROR* resume of IP block <sdma_v4_0> failed -110
  Nov 03 16:35:56 laptop kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset(1) 
failed
  Nov 03 16:35:56 laptop kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset end 
with ret = -110
  Nov 03 16:35:56 laptop kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU 
Recovery Failed: -110

  
  and it continues to crash:
  Nov 03 16:35:59 laptop kernel: amdgpu 0000:07:00.0: amdgpu: [mmhub0] no-retry 
page fault (src_id:0 ring:40 vmid:5 pasid:0, for process  pid 0 thread  pid 0)
  Nov 03 16:35:59 laptop kernel: amdgpu 0000:07:00.0: amdgpu:   in page 
starting at address 0x000080010142c000 from IH client 0x12 (VMC)
  Nov 03 16:35:59 laptop kernel: amdgpu 0000:07:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00540051
  Nov 03 16:35:59 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      Faulty UTCL2 
client ID: MP1 (0x0)
  Nov 03 16:35:59 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      MORE_FAULTS: 
0x1
  Nov 03 16:35:59 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      
WALKER_ERROR: 0x0
  Nov 03 16:35:59 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      
PERMISSION_FAULTS: 0x5
  Nov 03 16:35:59 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      
MAPPING_ERROR: 0x0
  Nov 03 16:35:59 laptop kernel: amdgpu 0000:07:00.0: amdgpu:      RW: 0x1
  Nov 03 16:35:59 laptop kernel: amdgpu 0000:07:00.0: amdgpu: [mmhub0] no-retry 
page fault (src_id:0 ring:40 vmid:5 pasid:0, for process  pid 0 thread  pid 0)
  Nov 03 16:35:59 laptop kernel: amdgpu 0000:07:00.0: amdgpu:   in page 
starting at address 0x000080010142d000 from IH client 0x12 (VMC)
  Nov 03 16:35:59 laptop kernel: amdgpu 0000:07:00.0: amdgpu: 
VM_L2_PROTECTION_FAULT_STATUS:0x00540051

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1995956/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to