From: Jiří Špác <[email protected]>

Navy Flounder (Navi22, RX 6700/6700 XT, GC IP 10.3.2) suffers repeated
gfx_0.1.0 ring timeouts when multiple applications request high-priority
Vulkan GPU contexts simultaneously (e.g. VS Code + Brave browser, both
Electron/Chromium-based).

On GC 10.3.x hardware, high-priority contexts are routed to the pipe1
hardware queue (gfx_0.1.0). When multiple processes compete on this
single queue the Command Processor hangs, and ring reset fails:

  amdgpu 0000:03:00.0: amdgpu: ring gfx_0.1.0 timeout, signaled seq=107039, 
emitted seq=107040
  amdgpu 0000:03:00.0: amdgpu: Ring gfx_0.1.0 reset failed

The seq delta of 1 is consistent with a single job submitted to pipe1
that never completes due to a preemption/scheduling deadlock. Once reset
fails the display manager crashes and the login screen appears.

Fix this by setting num_pipe_per_me = 1 for GC 10.3.2, disabling pipe1.
All other queue parameters are kept identical to the rest of GC 10.3.x.

Reported-by: Jiří Špác <[email protected]>
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/4985
Fixes: 3b094d4df4b0 ("drm/amd/amdgpu: add pipe1 hardware support")
Cc: [email protected]
Signed-off-by: Jiří Špác <[email protected]>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 1893ceeeb..a44103622 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4773,7 +4773,6 @@ static int gfx_v10_0_sw_init(struct amdgpu_ip_block 
*ip_block)
                adev->gfx.mec.num_queue_per_pipe = 8;
                break;
        case IP_VERSION(10, 3, 0):
-       case IP_VERSION(10, 3, 2):
        case IP_VERSION(10, 3, 1):
        case IP_VERSION(10, 3, 4):
        case IP_VERSION(10, 3, 5):
@@ -4787,6 +4786,22 @@ static int gfx_v10_0_sw_init(struct amdgpu_ip_block 
*ip_block)
                adev->gfx.mec.num_pipe_per_mec = 4;
                adev->gfx.mec.num_queue_per_pipe = 4;
                break;
+       case IP_VERSION(10, 3, 2):
+               /*
+                * Navy Flounder (Navi22): enabling pipe1 (gfx_0.1.0) causes
+                * GFX ring timeouts under concurrent high-priority Vulkan
+                * workloads (e.g. multiple Electron/Chromium apps). The
+                * high-priority contexts routed to pipe1 contend on a single
+                * hardware queue, the CP hangs, and ring reset fails, crashing
+                * the display manager. Disable pipe1 to avoid this.
+                */
+               adev->gfx.me.num_me = 1;
+               adev->gfx.me.num_pipe_per_me = 1;
+               adev->gfx.me.num_queue_per_pipe = 2;
+               adev->gfx.mec.num_mec = 2;
+               adev->gfx.mec.num_pipe_per_mec = 4;
+               adev->gfx.mec.num_queue_per_pipe = 4;
+               break;
        default:
                adev->gfx.me.num_me = 1;
                adev->gfx.me.num_pipe_per_me = 1;
-- 
2.51.0

Reply via email to