The following page fault was observed during the exit moment of the
HIP test process. In this particular error case, the HIP test
(./MemcpyPerformance -h) does not require the AQL queue. As a result,
the process_context_addr was not assigned when the KFD process was
released, ultimately leading to this page fault during the execution of
kfd_process_dequeue_from_all_devices().

[345962.294891] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 
ring:153 vmid:0 pasid:0)
[345962.295333] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 
0x0000000000000000 from client 10
[345962.295775] amdgpu 0000:03:00.0: amdgpu: 
GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B33
[345962.296097] amdgpu 0000:03:00.0: amdgpu:     Faulty UTCL2 client ID: CPC 
(0x5)
[345962.296394] amdgpu 0000:03:00.0: amdgpu:     MORE_FAULTS: 0x1
[345962.296633] amdgpu 0000:03:00.0: amdgpu:     WALKER_ERROR: 0x1
[345962.296876] amdgpu 0000:03:00.0: amdgpu:     PERMISSION_FAULTS: 0x3
[345962.297135] amdgpu 0000:03:00.0: amdgpu:     MAPPING_ERROR: 0x1
[345962.297377] amdgpu 0000:03:00.0: amdgpu:     RW: 0x0
[345962.297682] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 
ring:169 vmid:0 pasid:0)

Signed-off-by: Prike Liang <[email protected]>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index cee38bb6cfaf..4d313144cc4b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -1062,6 +1062,11 @@ int amdgpu_mes_flush_shader_debugger(struct 
amdgpu_device *adev,
                return -EINVAL;
        }
 
+       if (!process_context_addr) {
+               dev_warn(adev->dev, "invalidated process context addr\n");
+               return -EINVAL;
+       }
+
        op_input.op = MES_MISC_OP_SET_SHADER_DEBUGGER;
        op_input.set_shader_debugger.process_context_addr = 
process_context_addr;
        op_input.set_shader_debugger.flags.process_ctx_flush = true;
-- 
2.34.1

Reply via email to