From: Mingyu Wang <[email protected]> While validating the driver using DevGen (a framework that synthesizes virtual device models directly from driver source code via LLM guidance), a severe soft lockup was observed.
The hardware polling loops in `__ast_mindwm`, `__ast_moutdwm`, and `ast_2500_patch_ahb` lack a timeout mechanism. On bare-metal systems, if the ASPEED chip becomes unresponsive or a PCIe bus fault occurs, the CPU will spin indefinitely in these loops. This results in a system hang, triggering the watchdog soft lockup and causing subsequent I/O starvation (e.g., blocking jbd2). Fix this by introducing a bounded loop with a safe timeout of approximately 100ms using `udelay(10)`. Using `udelay()` ensures that the fix remains safe even if these accessors are called from an atomic context or while holding spinlocks. If the hardware fails to respond, the loop breaks and emits a `WARN_ONCE`, allowing the kernel to degrade gracefully and preventing complete system paralysis. Signed-off-by: Mingyu Wang <[email protected]> --- Hi Thomas, Thanks for the prompt response and confirmation! Instead of just waiting for threshold suggestions, I have drafted this patch to address the soft lockup. Since changing the return types of `__ast_mindwm` and `__ast_moutdwm` to propagate error codes (e.g., `-ETIMEDOUT`) would require an intrusive refactoring across the entire AST driver, I took a more defensive, minimal-invasive approach. To avoid the risk of sleeping in an atomic context (in case these low-level I/O accessors are ever called under a spinlock), I used a bounded loop with `udelay(10)` and a maximum of 10000 iterations (approx. 100ms total timeout). If the ASPEED hardware completely fails to respond, it breaks the infinite loop, emits a `WARN_ONCE`, and prevents the CPU from halting the entire system. Please let me know if you think the 100ms threshold and the `udelay` approach are appropriate for this specific AHB/SCU hardware sequence. drivers/gpu/drm/ast/ast_2500.c | 8 +++++++- drivers/gpu/drm/ast/ast_post.c | 16 ++++++++++++++-- 2 files changed, 21 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/ast/ast_2500.c b/drivers/gpu/drm/ast/ast_2500.c index 2a52af0ded56..08d18f90201a 100644 --- a/drivers/gpu/drm/ast/ast_2500.c +++ b/drivers/gpu/drm/ast/ast_2500.c @@ -107,6 +107,7 @@ static const u32 ast2500_ddr4_1600_timing_table[REGTBL_NUM] = { void ast_2500_patch_ahb(void __iomem *regs) { u32 data; + int retries = 10000; /* ~100ms timeout */ /* Clear bus lock condition */ __ast_moutdwm(regs, 0x1e600000, 0xAEED1A03); @@ -136,7 +137,12 @@ void ast_2500_patch_ahb(void __iomem *regs) do { __ast_moutdwm(regs, 0x1e6e2000, 0x1688A8A8); data = __ast_mindwm(regs, 0x1e6e2000); - } while (data != 1); + if (data == 1) + break; + udelay(10); + } while (--retries); + + WARN_ONCE(!retries, "ast: timeout waiting for AHB patch\n"); __ast_moutdwm(regs, 0x1e6e207c, 0x08000000); /* clear fast reset */ } diff --git a/drivers/gpu/drm/ast/ast_post.c b/drivers/gpu/drm/ast/ast_post.c index b72914dbed38..66eb80925e27 100644 --- a/drivers/gpu/drm/ast/ast_post.c +++ b/drivers/gpu/drm/ast/ast_post.c @@ -37,13 +37,19 @@ u32 __ast_mindwm(void __iomem *regs, u32 r) { u32 data; + int retries = 10000; /* ~100ms timeout */ __ast_write32(regs, 0xf004, r & 0xffff0000); __ast_write32(regs, 0xf000, 0x1); do { data = __ast_read32(regs, 0xf004) & 0xffff0000; - } while (data != (r & 0xffff0000)); + if (data == (r & 0xffff0000)) + break; + udelay(10); + } while (--retries); + + WARN_ONCE(!retries, "ast: timeout reading from AHB/SCU\n"); return __ast_read32(regs, 0x10000 + (r & 0x0000ffff)); } @@ -51,13 +57,19 @@ u32 __ast_mindwm(void __iomem *regs, u32 r) void __ast_moutdwm(void __iomem *regs, u32 r, u32 v) { u32 data; + int retries = 10000; /* ~100ms timeout */ __ast_write32(regs, 0xf004, r & 0xffff0000); __ast_write32(regs, 0xf000, 0x1); do { data = __ast_read32(regs, 0xf004) & 0xffff0000; - } while (data != (r & 0xffff0000)); + if (data == (r & 0xffff0000)) + break; + udelay(10); + } while (--retries); + + WARN_ONCE(!retries, "ast: timeout writing to AHB/SCU\n"); __ast_write32(regs, 0x10000 + (r & 0x0000ffff), v); } -- 2.34.1
