From: Jiří Špác <[email protected]> Navy Flounder (Navi22, RX 6700/6700 XT, GC IP 10.3.2) suffers repeated gfx_0.1.0 ring timeouts when multiple applications request high-priority Vulkan GPU contexts simultaneously (e.g. VS Code + Brave browser, both Electron/Chromium-based).
On GC 10.3.x hardware, high-priority contexts are routed to the pipe1 hardware queue (gfx_0.1.0). When multiple processes compete on this single queue the Command Processor hangs, and ring reset fails: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.1.0 timeout, signaled seq=107039, emitted seq=107040 amdgpu 0000:03:00.0: amdgpu: Ring gfx_0.1.0 reset failed The seq delta of 1 is consistent with a single job submitted to pipe1 that never completes due to a preemption/scheduling deadlock. Once reset fails the display manager crashes and the login screen appears. Fix this by setting num_pipe_per_me = 1 for GC 10.3.2, disabling pipe1. All other queue parameters are kept identical to the rest of GC 10.3.x. Reported-by: Jiří Špác <[email protected]> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/4985 Fixes: 3b094d4df4b0 ("drm/amd/amdgpu: add pipe1 hardware support") Cc: [email protected] Signed-off-by: Jiří Špác <[email protected]> --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 1893ceeeb..a44103622 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -4773,7 +4773,6 @@ static int gfx_v10_0_sw_init(struct amdgpu_ip_block *ip_block) adev->gfx.mec.num_queue_per_pipe = 8; break; case IP_VERSION(10, 3, 0): - case IP_VERSION(10, 3, 2): case IP_VERSION(10, 3, 1): case IP_VERSION(10, 3, 4): case IP_VERSION(10, 3, 5): @@ -4787,6 +4786,22 @@ static int gfx_v10_0_sw_init(struct amdgpu_ip_block *ip_block) adev->gfx.mec.num_pipe_per_mec = 4; adev->gfx.mec.num_queue_per_pipe = 4; break; + case IP_VERSION(10, 3, 2): + /* + * Navy Flounder (Navi22): enabling pipe1 (gfx_0.1.0) causes + * GFX ring timeouts under concurrent high-priority Vulkan + * workloads (e.g. multiple Electron/Chromium apps). The + * high-priority contexts routed to pipe1 contend on a single + * hardware queue, the CP hangs, and ring reset fails, crashing + * the display manager. Disable pipe1 to avoid this. + */ + adev->gfx.me.num_me = 1; + adev->gfx.me.num_pipe_per_me = 1; + adev->gfx.me.num_queue_per_pipe = 2; + adev->gfx.mec.num_mec = 2; + adev->gfx.mec.num_pipe_per_mec = 4; + adev->gfx.mec.num_queue_per_pipe = 4; + break; default: adev->gfx.me.num_me = 1; adev->gfx.me.num_pipe_per_me = 1; -- 2.51.0
