On Sat, Jun 27, 2026 at 10:18:45PM +1000, Sam Day via B4 Relay wrote:
> From: Sam Day <[email protected]>
> 
> If the peak vote for mdp1-mem is allowed to drop to zero, it seems to
> cause the fabric to collapse that path entirely, which causes the device
> to bus stall and fatally reset.
> 
> This issue was identified specifically on sdm845-oneplus-fajita, so this
> workaround is applied narrowly to SDM845's MDSS.
> 
> ---
> This RFC patch is a spiritual successor to the "Addressing stability
> issues on SDM845 with the -next tree" series sent by David and Petr 6
> months ago.
> 
> As Dmitry pointed out, the patch introduces leakages to the runtime PM
> refcounting. In practice, this means that MDSS never actually gets
> suspended, which is why the patch appeared to "fix" the issue.
> 
> The deeper root cause is that, when msm_mdss_disable() runs and unvotes
> the mdp1-mem interconnect bandwidth, that seems to collapse the fabric
> entirely and causes the bus stall -> hang -> reboot behaviour.
> 
> I've confirmed that a tiny non-zero peak bandwidth vote keeps the fabric
> alive and avoids the issue.
> 
> Of course, this is still a fairly egregious hack, but it *does* allow
> blanking to suspend and resume DSI + DPU + MDSS properly without the bus
> stall.

I'm a bit sceptical about this patch. The Lenovo Yoga C630 uses a
variant of SDM845. There I don't observe any issues with the MM itself.
But cluster suspend can cause issues there too. I suspect that there is
a missing vote (or undervote) on the CX or MX, which results in
suspend/resume crashes. And if that's true, then your patch does exactly
that - I think it will add an internal CX vote, which won't be dropped,
preventing CX collapse.

> 
> Here's what I've validated with instrumentation:
> 
>  * DSI host disable, IRQ disable, PLL state save, host power-off, link
>    clock disable, regulator disable, SFPB disable, and PHY disable all
>    complete successfully before the fatal reset occurrs.
>  * DPU runtime suspend also completes. The bandwidth accounting was
>    checked and confirmed to reach runtime suspend with 0 refs, with no
>    pending frame state.
>  * The device survives through MDSS clock disabling and mdp0-mem
>    zero voting, it's really just the mdp1-mem zero vote that is isolated
>    as the cause of the stall + reset.

Will it work if you suspend the MDSS (dropping all votes) and then
forcibly break the device suspend by returning an error from the later
stage?
> 
> So, I'm not really sure where to go from here. I'm sure that this
> workaround is not suitable for inclusion upstream as it still seems to
> be papering over an underlying issue... But it's unclear to me if this is
> some kind of hardware quirk on SDM845, a problem with the SDM845 DT
> wiring, a driver issue, or something else entirely.

I don't have a good advice here. Try disabling cluster idle node. If the
device still works, it's not a mdp1-mem.

> 
> I'd appreciate any advice on how to further diagnose this issue and what
> direction to take from here.
> 
> Kind regards,
> -Sam
> 
> Link: 
> https://lore.kernel.org/phone-devel/[email protected]/
> Signed-off-by: Sam Day <[email protected]>

-- 
With best wishes
Dmitry

Reply via email to