From: Ce Sun <[email protected]>

[ Upstream commit 3ee1c72606bd2842f0f377fd4b118362af0323ae ]

Tune the sleep interval in the PSP fence wait loop from 10-100us to
60-100us.This adjustment results in an overall wait window of 1.2s
(60us * 20000 iterations) to 2 seconds (100us * 20000 iterations),
which guarantees that we can retrieve the correct fence value

Signed-off-by: Ce Sun <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---

LLM Generated explanations, may be completely bogus:

Now I have all the key information. Let me analyze this commit:

## Analysis

### What the commit does
This is a one-line change that increases the minimum sleep time in the
PSP (Platform Security Processor) fence wait loop from `usleep_range(10,
100)` to `usleep_range(60, 100)`.

### Context
The `psp_cmd_submit_buf()` function submits a command to AMD's Platform
Security Processor and waits for completion by polling a fence buffer.
The loop runs up to `psp_timeout = 20000` iterations.

**Previous behavior:** With `usleep_range(10, 100)`, the total wait
window was approximately:
- Minimum: 10us * 20000 = 200ms
- Maximum: 100us * 20000 = 2000ms (2s)

**New behavior:** With `usleep_range(60, 100)`, the total wait window
is:
- Minimum: 60us * 20000 = 1200ms (1.2s)
- Maximum: 100us * 20000 = 2000ms (2s)

### Bug Analysis
The commit message says this "guarantees that we can retrieve the
correct fence value." This implies that with the 10us minimum, the
kernel scheduler could wake up after approximately 10us each iteration,
causing the 20000 iterations to complete in as little as ~200ms —
**before the PSP hardware has finished its work**. The fix increases the
minimum wait to ensure the total poll window is at least 1.2s.

This is fixing a **real bug**: a timeout that is too short, causing
legitimate PSP commands to fail spuriously. When PSP fence wait fails,
line 762-764 shows it returns `-EINVAL`, which causes firmware load
failures and GPU initialization failures. This can make the GPU non-
functional.

### Stable criteria evaluation
1. **Fixes a real bug**: Yes — PSP commands can time out prematurely,
   causing GPU initialization failures.
2. **Obviously correct**: Yes — it's a one-line change to a sleep range,
   reviewed by AMD engineer Lijo Lazar.
3. **Small and contained**: Yes — single line change in one file.
4. **No new features**: Correct — only adjusts a timing parameter.
5. **Risk**: Very low. The maximum sleep time (100us) is unchanged. The
   only effect is that the minimum sleep per iteration increases, making
   the total wait time longer and more reliable. This cannot cause
   regressions — it only makes the timeout more generous.

### Concerns
- The change is a timing adjustment, which can sometimes be considered a
  "tuning" change rather than a bug fix. However, the commit message
  explicitly states it fixes a correctness problem (failing to retrieve
  the correct fence value), and the consequence of the bug is GPU
  initialization failure.
- No explicit `Fixes:` tag, but that is expected for autosel candidates.

### User Impact
PSP fence wait failures manifest as GPU initialization failures. Users
affected would see their AMD GPU fail to initialize, making it unusable.
This is a high-severity issue for affected users.

## Verification
- Confirmed `psp_timeout = 20000` at line 268 of `amdgpu_psp.c` —
  verified the iteration count claim in the commit message.
- Confirmed the wait loop structure at lines 718-731 — the loop polls
  with `usleep_range` and breaks on timeout.
- Confirmed that timeout failure leads to `-EINVAL` return at line 763,
  causing command failure.
- Confirmed the change is a single-line edit from `usleep_range(10,
  100)` to `usleep_range(60, 100)`.
- Verified the commit is reviewed by Lijo Lazar (AMD PSP engineer) and
  signed off by Alex Deucher (AMD GPU maintainer).
- The math checks out: 10us * 20000 = 200ms (too short), 60us * 20000 =
  1.2s (sufficient per commit message).

This is a small, obvious, low-risk fix for a real bug that causes GPU
initialization failures. It meets all stable kernel criteria.

**YES**

 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 0b10497d487c3..81bdd6aaad2a1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -726,7 +726,7 @@ psp_cmd_submit_buf(struct psp_context *psp,
                ras_intr = amdgpu_ras_intr_triggered();
                if (ras_intr)
                        break;
-               usleep_range(10, 100);
+               usleep_range(60, 100);
                amdgpu_device_invalidate_hdp(psp->adev, NULL);
        }
 
-- 
2.51.0

Reply via email to